1 / 14

Lemur Indri Search Engine

Lemur Indri Search Engine. Yatish Hegde 03/03/2010. Background. Open source text search engine Combines language modeling and inference networks Inquery query language API – accesible from C++, Java, C# and PHP. Html, xml, txt, trectext , trecweb , ppt , doc*, ppt *. Resources.

zora
Download Presentation

Lemur Indri Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lemur Indri Search Engine YatishHegde 03/03/2010

  2. Background • Open source text search engine • Combines language modeling and inference networks • Inquery query language • API – accesible from C++, Java, C# and PHP. • Html, xml, txt, trectext, trecweb, ppt, doc*, ppt*

  3. Resources • Website: http://lemurproject.org • Tutorials: http://sourceforge.net/apps/trac/lemur/wiki • Forum: http://sourceforge.net/projects/lemur/forums

  4. How to get started? • Cygwin: http://cygwin.com (include “perl”, “vi editor” and “make” package while installing) • Lemur Toolkit: http://sourceforge.net/projects/lemur/develop • TREC Eval: http://trec.nist.gov/trec_eval/

  5. Installing Lemur Inside Lemur Directory - • ./configure • make • make install • Build Index – IndriBuildIndex • Run Query - IndriRunQuery

  6. Building Index • IndriBuildIndex <parameterFile> • <parameters> <index>/home/lemur/testindex</index> <memory>1G</memory> <corpus> <path>/home/lemur/testdata/firstCorpus</path> <class>trectext</class> </corpus> <corpus> <path>/home/lemur/testdata/secondCorpus</path> <class>trecweb</class> </corpus> <stemmer> <name>krovetz</name> </stemmer> <field> <name>p</name> </field> </parameters>

  7. Running Query • IndriRunQuery <queryFile> <stopwordFile> <queryOptions> • Query File <parameters> <query> <number>701</number> <text>oil industry history</text> </query> </parameters> • Stop Word File <parameters> <stopper> <word>the</word> </stopper> </parameters> • Query Options File <parameters> <trecFormat>true</trecFormat> <index>/path/to/index</index> <count>1000</count> </parameters>

  8. Converting Topic File into Query File • Topic File <top> <num> Number: 301 <title> International Organized Crime <desc> Description: Identify organizations that participate in international criminal activity, the activity, and, if possible, collaborating organizations and the countries involved. <narr> Narrative: A relevant document must as a minimum identify the organization and the type of illegal activity (e.g., Columbian cartel exporting cocaine). Vague references to international drug trade without identification of the organization(s) involved would not be relevant. </top>

  9. Converting Topic File into Query File Perl Program: • ./topicToQuery.pl [-t] [-d] <inputFile> <outputFile> • ./topicToQuery.pl -h

  10. TREC Eval • make • trec_eval -q -c -M1000 official_qrelsquery_results • More Documentation: http://trecvid.nist.gov/trecvid.tools/trec_eval_video/README

  11. Lemur Search UI • User Interface: http://sourceforge.net/apps/trac/lemur/wiki/The%20Lemur%20CGI%20Application • How it looks? http://sewell.syr.edu/lemur/lemur.cgi

  12. Indri Query Langauge • #combine( white house) • #1(white house) • #5(white house) • #band(white house) • #band(oil fields) #1(white house) <parameters> <query> <number> 301 </number> <text> #combine( Identify organizations that participate in #max( #1( international criminal activity) international criminal activity ) the activity and if possible collaborating organizations and the countries involved) </text> </query> </parameters>

  13. Contact If you have questions - YatishHegde: yhegde@syr.edu

  14. Thank You

More Related