This is the first exercise in a tutorial series introducing Lucene, the text search engine library. Source for the exercises in this series is available on Github and the only prerequisite for running the initial exercises is Groovy 2.0. The texts that will be indexed in the exercises come from Project Gutenberg. This exercise will illustrate by walking through a Groovy script how simple it is to index a document and in turn search for terms by indexing ‘The complete works of Shakespeare’ and allowing for a single term search to be performed. First we shall use a feature in Groovy called @Grab which will add the required dependencies for Lucene onto the script’s classpath.
Next we shall create an IndexWriter using a standard analyser and a RAMDirectory where the index will be stored for the duration of the script.
We shall then use an anonymous closure to add each line of ‘The complete works of Shakespeare’ into the index along with its associated lineNumber.
With the indexing of ‘The complete works of Shakespeare’ finished, it is now time to search for lines which contain a term.
Finally it is time to run the script and see which lines match our term.
Try using wildcards as long as they are not the first character (as this breaks the rules for the Lucene Query Syntax)
groovy IndexAndSearchShakespeare.groovy Monk*