1 / 9

Searching

Searching. Google: page rank and anchor text Hits: hubs and authorities MSN’s Ranknet: learning to rank Today’s web dragons. r ank (p) #o utlinks (p). . r ank (~me) =. p 1. ~me. C(p,q) o(p). . r(q) = . r(p). p 2. r =  C r. p 3. r is an eigenvector of C.

bono
Download Presentation

Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching • Google: page rank and anchor text • Hits: hubs and authorities • MSN’s Ranknet: learning to rank • Today’s web dragons

  2. rank(p)#outlinks(p)  rank(~me) = p1 ~me C(p,q)o(p)  r(q) =  r(p) p2 r =  C r p3 r is an eigenvector of C Random surfer model • Broken links (hence ) • Trapping states (adjust C) How to search: Google’s pagerank • Pagerank • Anchor text

  3. Random surfer New archipelago Milgram’s continent Corporate continent 20% of nodes 30% of nodes 20% of nodes Terra incognita 30% of nodes Chart of the web vs random searcher

  4. Google uses: • In anchor text? • In URL? • Title • Meta tags • <h> level • Rel font size • Capitalization • Word pos in doc • Secret ingredients … and weights them according to a secret recipe Google search: anchor text • Pagerank • Anchor text ~me:this is the best page ever you:that is the best page ever ~me:

  5. hub(x) =  authority(p) hub authority =  C(x,p) auth(p) hub = C auth auth = CThub hub = C.CThub hub is an eigenvector of C.CT HITS: hubs and authorities Principal eigenvector  strongest communityOther eigenvectors other communities

  6. Using HITS: Ask’s Teoma Web communities jaguar <car> jaguar <animal> jaguar <Mac OS> jaguar <auto racing team> jaguar <Jacksonville Jaguars> jaguar

  7. Query  neighborhood graph(search hits + neighbors) Using HITS: Ask’s Teoma Web communities jaguar <car> jaguar <animal> jaguar <Mac OS> jaguar <auto racing team> jaguar <Jacksonville Jaguars> Hub scores (lists of resources)Authority scores (target pages) helps to deal with synonyms pull in other relevant pages(e.g. Toyota is authority for “auto manufacturers” yet doesn’t contain the term)

  8. Learning to rank: MSN’s Ranknet Training setqueries with matching documents from human judgesDiscriminant functione.g. weighted sum of features, plus thresholdMachine learninglearn the weightsApply to real queries 17,000 queries10 documents/query human judgement (1–5)600 featurespairs of docs with same query: which is more highly ranked?train a neural net (1-layer, 2-layer) Results? — Pretty good

  9. Sergey Brin Larry Page Today’s web dragons 49% Google 1998 200423% Yahoo 1994 1996 Inktomi 2002 AltaVista 200310% MSN 2005 7% AOL Excite since 1997, Google since 2002 2% Ask (Jeeves) Teoma 2001

More Related