90 likes | 299 Views
Information Retrieval Project. Team 9 資研一 90522035 黃國瑜 資研一 90522045 何聰鑫 資研一 90522077 丁智凱. System architecture. CPU Speed PIII 1G RAM 256 Mb OS Win 2000 Programming php Database MySQL. Indexing method(1/3). Indexing Using lower case of letter Elimination of stopwords
E N D
Information Retrieval Project Team 9 資研一 90522035 黃國瑜 資研一 90522045 何聰鑫 資研一 90522077 丁智凱
System architecture • CPU Speed • PIII 1G • RAM • 256 Mb • OS • Win 2000 • Programming • php • Database • MySQL
Indexing method(1/3) • Indexing • Using lower case of letter • Elimination of stopwords • Using hash table • 317 word • Removing punctuation mark • Removing letters with length less than 3 • Removing <tag>
Indexing method(2/3) • Database Table • IndexMap • (Index, TermID, DocID, Line, Pattern) • DocMap • (DocID, FileName, DocTitle) • TermMap • (TermID, Term)
Indexing method(3/3) • Indexing Speed • 130 sec/Mb • Total : 125sec * 490Mb=17 hr • E.q • File Name : FB496255 • File Size : 997438 • Total Term : 8523 • Start : 1004540338.9145 sec • End : 1004540464.1279 sec • Total : 125.2134180069 sec
Query(1/3) • Interface • Query • Insert New Data • Existed Data View • Help • Mail
Query(2/3) • Query • Feature • Multiple keyword query • Title Query • Speed • Match String : 6448 • Search Time :2.3293360471725 sec • Match String : 239 • Search Time :0.72075593471527 sec ( Base on speed of netwrok and result number)
Query(3/3) • Output • Performance • Match String • Search Time • Query Result • File Name • Document Title • Line ( show 5 line ) • # of Pattern ( Highlight Mark )