1 / 12

Experiments with MATLAB Google PageRank

Experiments with MATLAB Google PageRank. Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan jang@mirlab.org http://mirlab.org/jang. PageRank Algorithm. Facts about PageRank Algorithm

fonda
Download Presentation

Experiments with MATLAB Google PageRank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiments with MATLABGoogle PageRank Roger Jang (張智星) CSIE Dept, National Taiwan University, Taiwan jang@mirlab.org http://mirlab.org/jang

  2. PageRank Algorithm • Facts about PageRank Algorithm • Developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford University • Determined entirely by the link structure of the WWW • Recomputed about once a month • The world’s largest matrix computation • Ideas • A random walk problem known as Markov chain/process • Page rank: Limiting probability that a random surfer visits a page • A page has high rank if other pages with high rank link to it.

  3. Connectivity Matrix G • Notations • U: the set of all n web pages in the world (n > 4 billion by June 2004) • G: the connectivity matrix • gij = 1 if there is a hyperlink to page i from page j and • gij = 0 otherwise. • Facts • G is huge, but very sparse • No. of nonzeros in G is the total no. of hyperlinks in U. 1 4 2 3 6 5

  4. Degrees of a Page 1 4 • Degrees of a page • Define row and column sums of G: • cj: out-degree of page j • ri: in-degree of page i 2 3 6 5

  5. Transition Matrix A • The jth column of A is the prob. of jumping from the jth page to the other pages • Two-types of transitions • Type 1: Follow one of the link (with prob. p) • Type 2: Jump to a random page (with prob. 1-p)

  6. Transition Matrix A • Facts • A is the transition prob. matrix of the Markov chain. • Its elements are all strictly between 0 and 1 and its column sums are all equal to 1. • A comes from scaling G by its column sums. • Most of the elements of A are equal to (1-p)/n. • If n=4*10^9 and p=0.85, then (1-p)/n=3.75*10^-11. • Perron-Frobenius theorem: A nonzero solution of x=Ax exists and is unique to within a scaling factor. • If the scaling factor is chosen so that the sum of x is 1, then x is Google’s PageRank.

  7. How to Compute PageRank • Eigenvector method • x=A*x  x is the eigenvector corresponding to eigenvalue 1 • Fact • A always has an eigenvalue of 1 • Power method • Repeat x=A*x until x converges • The only possible approach for a large n • Fact • 1 is the eigenvalue of the maximum length of A’s eigenvalues • Anx is not affected by x as n increases

  8. Fact 1 • A always has an eigenvalue of 1 • Since the column sum of A is an all-1 vector, AT has 1 as its eigenvalue: • So 1 is also an eigenvalue of A since

  9. Fact 1 (Another proof)

  10. Eigenvalue Decomposition

  11. Fact 2 • A has 1 as its eigenvalue of max magnitude • Anx approaches the page rank as long as n is big enough and x sums to 1.

  12. Example • A tiny web • Transition matrix A • When p=0.85, we have the page rank (via pagerank.m): 1 4 2 3 6 5

More Related