1 / 4

Lucene Lab 2

Lucene Lab 2. 030209. General IR Process. Start Indexing (start stepping though all files). Tokenize & stem each file. Index. 1 st , Index. Run query against index. User enters (roughly) natural language query. Tokenize & stem the query. Results. 2 nd , Query/ Search.

doriscarter
Download Presentation

Lucene Lab 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lucene Lab 2 030209

  2. General IR Process Start Indexing (start stepping though all files) Tokenize & stem each file Index 1st, Index Run query against index User enters (roughly) natural language query Tokenize & stem the query Results 2nd, Query/ Search

  3. Lucene Process IndexWriter.java StandardAnalyzer.java or Other analyzer Index 1st, Index Run query against index User enters (roughly) natural language query Tokenize & stem the query Results 2nd, Query/ Search

  4. Lucene Lab All below will be run against the policies directory. 1) Create your own StopWord file & run it with the StopAnalyzer. Export the results to an XML file. • Send the • source file • XML file, • your StopWord file to Jeff by beginning of class Wed. 2) Compile the SearchFiles.java program & run it against your indices. Do this for: -- indexing with the StandardAnalyzer -- indexing with the SimpleAnalyzer -- indexing with the StopAnalyzer -- indexing with the StopAnalyzer with your stop words For each of the above, do one run with ‘Streaming’ option & one with the ‘Paging’ option. The \docs\demo2.html file briefly discusses the difference. Review the usage statement in the source code to see how to select between the two. Take a screen shot of the results. So this portion of the Lab/Homework will a total of 8 screen shots – a screen shot of the Streaming option & a screen shot of the Paging option for each of the index files above. **REMEMBER – The SearchFiles program must use THE SAME ANALYZER as the one that created the index being searched.** For example, when you search the index created with the StopAnalyzer, then your SearchFiles program must invoke the same analyzer, StopAnalyzer in this case in order to get appropriate results.

More Related