1 / 45

Page Rank Modifications & Alternatives

Page Rank Modifications & Alternatives. Brett Harper. Overview. Computing Customized Page Ranks Adaptive Ranking of Web Pages Generalizing PageRank Damping Functions for Link-Based Ranking Algorithms An Approach to Confidence Based Page Ranking for User-Oriented Web Search

holland
Download Presentation

Page Rank Modifications & Alternatives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Page Rank Modifications & Alternatives Brett Harper

  2. Overview • Computing Customized Page Ranks • Adaptive Ranking of Web Pages • Generalizing PageRank Damping Functions for Link-Based Ranking Algorithms • An Approach to Confidence Based Page Ranking for User-Oriented Web Search • Web Page Ranking using Link Attributes

  3. Computing Customized Page Ranks • Page rank usually depends on how related a document is to a query, and the quality of the document. • PageRank introduces document authority. • Similar to the citation problem. • Most proposed web ranking algorithms are based on connectivity rather than content. • For customized ranks, the concept of page importance depends on the situation.

  4. Computing Customized Page Ranks • Current solutions build different ranks for topics, users, or queries. • Automatic building of the ranking function from a set of user examples.

  5. Computing Customized Page Ranks • Brin & Page's PageRank • Generalized PageRank, where x is a vector containing ranks, W is an n*n matrix, and e is an n-vector. • Parametric PageRank, where the sum of each of the a's is 1.

  6. Computing Customized Page Ranks • User requirements are represented as an optimization problem where the variables are the user requirements and the total number of constraints. • The issue of how to obtain constraints is not discussed. • A cost function allows the ranks to be changed in accordance with the requirements. (Quadratic and linear) • Methods for infeasible requirements. • Penalty Function • Number of satisfied constraints, in addition to the cost function.

  7. Computing Customized Page Ranks • WT10G data set • Constraints defined • Adaptive rank computed • Compared to PageRank on entire WT10G dataset

  8. Computing Customized Page Ranks

  9. Computing Customized Page Ranks

  10. Adaptive Ranking of Web Pages • Alter PageRank by modifying the PageRank equation. • Can be done from perspective of the user or web site administrators. • Modify rank by changing (1-d) in the original PageRank. • Dynamic Control • Static Control

  11. Adaptive Ranking of Web Pages • Rules • B is an r*n matrix, b is a rule vector of size r • Inputs and outputs should be positive • The cost function allows the rank of certain pages to be modified while keeping the current rank of other pages.

  12. Adaptive Ranking of Web Pages • Initial solution was to structure the problem as a quadratic programming problem. • Second solution uses clusters to reduce the number of dimensions. • Pages are clustered based on score • Vector E contains k parameters. • Vector A is the sum of the columns in (I-dW)^-1 that correspond to a certain class.

  13. Adaptive Ranking of Web Pages • Vector E contains k parameters. • Vector A is the sum of the columns in M that correspond to a certain class. • H is defined as BA • is the quadratic term • is the linear term

  14. Adaptive Ranking of Web Pages • Contradicting constraints • Relax constraints to arrive at sub-optimal solution • Add s to the cost function (used to balance importance of contraints and original cost function)

  15. Adaptive Ranking of Web Pages • Use a clustering algorithm to split webpages into clusters. • Compute Ai • If there is a feasible solution, use the first formula to find the optimal parameters e1,...,ek. • If no feasible solution exists, use the version for relaxed constraints to find sub-optimal parameters e1,...,ek. • Compute rank as

  16. Adaptive Ranking of Web Pages • Used the WT10G data set for experiments • First experiment: Swap importance of two pages located some distance Δ apart. • Effectively modifies the PageRank • Constraints on highly ranked pages disturbs the rest of the pages more significantly. • These disruptions appear in blocks due to clustering. • When swapping two pages, effect is greater on lower ranked than higher ranked pages. • Quality of results is influenced by # of clusters.

  17. Adaptive Ranking of Web Pages • Second experiment: Change # of clusters • Gradually increase # of clusters used from 5 to 100. • Cost function stops improving at ~60 clusters. • Clustering can reduce the complexity level of the problem. • # of clusters quite small compared to the size of the collection.

  18. Adaptive Ranking of Web Pages • Clustering techniques • Cluster by score • Cluster by rank (variable-sized cluster dimensions) • Cluster by rank with fixed size cluster dimensions

  19. Adaptive Ranking of Web Pages • PageRanks can be modified, but constraints on some pages causes the ranks of all pages to be affected. • The effect of these constraints depends on how highly ranked the constrained page is.

  20. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Damping functions reduce page importance propogation on long paths. • Focus on linear, exponential, and hyperbolic decay. • Exponential corresponds to original PageRank.

  21. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • For functional rankings, a link matrix is used. • Normalization • Dangling nodes • If P is the resulting matrix after normalization, the rank is defined as

  22. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • An equivalent approach takes into account the branching contribution. • Rank of a node is the weighted sum of incoming paths, with weights that decay exponentially with path length. • PageRank is a functional ranking where the damping function is (1-α)α^t.

  23. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

  24. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Linear Damping

  25. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Hyperbolic Damping

  26. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Empirical Damping • Pages that are linked are similar, but the topic changes as the distance increases. • Use decrease in text similarity as an approximation to an empirical damping function. • .uk domain, 18m pages, 200 pages chosen at random, similarity measured using TF.IDF without stemming or stop-word removal • Results show that this is better approximated by linear damping with L=8 or 9 than by exponential damping.

  27. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms

  28. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Approximating Hyperbolic with Exponential Damping • Find the α that minimizes the difference of weights for different values of β and the maximum path length l.

  29. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Approximating Exponential with Linear Damping • Find the L that minimizes the difference of weights for different values of α and the maximum path length l.

  30. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Parameters for the damping function • Characteristic path length (average distance between two nodes) grows sub-logarithmically with the size of the graph. • For a smaller graph, the damping function should decay faster. • The sum of the weights up to the average path lengths of graphs L1 and L2 have to be similar for both rankings to behave in a similar way.

  31. Generalizing PageRank: Damping Functions for Link-Based Ranking Algorithms • Experimental Comparison of precision (PageRank vs. LinearRank) • Used the WebTREC Gov2 collection (25m documents, .gov domain, 2004) • Chose 50 queries at random to run. • PageRank took 39 iterations to run. LinearRank was run for 5, 10, and 20 iterations. • After first 5 results, LinearRank had precision similar to PageRank. • Useful when rankings can't be computed in advance.

  32. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Confidence is the probability of accessing a page for a specific query given past behavior. • Use this probability to enhance page rankings of most relevant pages. • Should also take link structure into account. • Merge pages with similar categories since users lose interest after first few results.

  33. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Extract important features and categories from web pages. • Prune pages from the graph that are not relevant. • Calculate confidence for all features and categories of each page. • Use citations (link structure) and confidence measure to recursively compute the page rank.

  34. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Extract important features and categories from web pages. • Search the full-text and extended anchor text for most relevant features/categories. • in the set of features where N(P,i) is the total # of times page P is accessed for query i and O(i) is the total number of queries made for i. • Pages with high E(P,a) will likely be accessed for the topic a.

  35. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Prune pages from the graph that are not relevant. • Pages without similar features/categories can be connected. • These pages are used for extracting features/ categories, but are pruned if the confidence does not meet a certain threshold. • Citations of pruned pages are also removed.

  36. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Calculate confidence for all features and categories of each page. • in the customized graph. • Calculating C(a,P) for the entire history is not realistic, so only take recent history into account.

  37. An Approach to Confidence Based Page Ranking for User Oriented Web Search • Use citations (link structure) and confidence measure to recursively compute the page rank. • PR(P,a) = (1-d) + d[PR(T1,a)/O(T1)+...+ PR(Tn,a)/O(Tn)], where Ti is a citing page and O(Ti) is the # of outgoing links. • RPR(P,a) = PR(P,a) * C(a,P) • New pages cited by many many relevant high-ranked pages. Can be suppressed by including a time period. • Substitute damping factor d with (1-C(a,P))

  38. An Approach to Confidence Based Page Ranking for User Oriented Web Search • The data set was constructed from a list of 7 queries, from which the top 30 results were obtained from Google. • A graph of these nodes was then created, and further expanded to a depth of 2. This new graph contained 500-800 nodes. • Higher ranked pages are not always accessed a higher number of times. • Pages can be accessed for multiple queries. • Pages with higher confidence tend to be ranked higher.

  39. Web Page Ranking using Link Attributes • Tries to improve on current ranking techniques by assigning different weights to links. (WLRank) • Relative position in the page • Tag where the link is contained • Length of anchor text

  40. Web Page Ranking using Link Attributes • L(j,i) is 1 if a link exists or 0 otherwise, and c is a constant that gives a base weight to every link • T(j,i) depends on the tag • AL(j,i) is length of anchor text divided by average anchor text length d. • RP(j,i) is the relative position weighted by constant b. • If W(j,i) = L(j,i) then it is equal to PageRank.

  41. Web Page Ranking using Link Attributes • Tested against 460k pages in the .CL domain. • Several users provided relevance judgements on the first 10 results of several queries. • Used c=1, b=1, and d=100. • Only used weights for <b> and <h1> tags. • Compare precision based on a perfect ranking for the first 10 answers. • Improvement of 13% on average.

  42. Web Page Ranking using Link Attributes

  43. Conclusions • PageRank can be modified to fit user requirements and specific categories. • Different functions can be used to decay PageRank influence on path lengths. • Can improve PageRank through clustering.

  44. References • Tsoi, A. C., Hagenbuchner, M., and Scarselli, F. 2006. Computing customized page ranks. ACM Trans. Interet Technol. 6, 4 (Nov. 2006), 381-414. • Tsoi, A. C., Morini, G., Scarselli, F., Hagenbuchner, M., and Maggini, M. 2003. Adaptive ranking of web pages. In Proceedings of the 12th international Conference on World Wide Web (Budapest, Hungary, May 20 - 24, 2003). WWW '03. ACM, New York, NY, 356-365. • Baeza-Yates, R., Boldi, P., and Castillo, C. 2006. Generalizing PageRank: damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Seattle, Washington, USA, August 06 - 11, 2006). SIGIR '06. ACM, New York, NY, 308-315. • Mukhopadhyay, D., Giri, D., and Singh, S. R. 2003. An approach to confidence based page ranking for user oriented Web search. SIGMOD Rec. 32, 2 (Jun. 2003), 28-33. • Baeza-Yates, R. and Davis, E. 2004. Web page ranking using link attributes. In Proceedings of the 13th international World Wide Web Conference on Alternate Track Papers &Amp; Posters (New York, NY, USA, May 19 - 21, 2004). WWW Alt. '04. ACM, New York, NY, 328-329.

  45. Questions

More Related