280 likes | 423 Views
Hybrid Prefetching for WWW Proxy Servers. Yui-Wen Horng , Wen-Jou Lin , Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan, R.O.C International Conference on Parallel and Distributed Systems,1998 Mikt Tien Miketien@syslab.cse.yzu.edu.tw
E N D
Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng , Wen-Jou Lin , Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan, R.O.C International Conference on Parallel and Distributed Systems,1998 Mikt Tien Miketien@syslab.cse.yzu.edu.tw Syslab Yan Zen
Outline • 1.Introduction • 2.Related work • 3.Prefetching Mechanism • 4.Experiment Result • 5.Conclusion and Future Work
1.Introduction • Depend on the location of cache,We can classify cache into three types: client cache,server cache,proxy cache • Some studies show that, the maximum possible hit rate of a proxy cache is about 30%-50%.To overcome prefetch is clear solution • So we classify prefetcher into three types: client prefetcher,server prefetcher,proxy prefetcher • Client Prefetcher can analyze personal requests to predict future request, proxy prefetcher can gather information from multi-client to multi-server.
2.Related Work • Interactive Prefetching proxy Server(Wcol) (Content Parsing) -- To get linked documents by parsing HTML pages(include images). -- advantage: Hit rate of the cache is more than 60% -- disadvantage: the traffic is 4.12 times larger than a normal caching proxy and task to parse HTML also adds overhead to the server..
Related Work(cont.) • Top-10 Approach --Requires cooperation between web server,proxy and client browser. The higher level servers know the popular documents to their lower level clients. -- advantage: Hit rate more than 40% and increase traffic is no more than 10% in most case. -- disadvantage: In order to achieve good prediction, every proxies and servers need to follow the same policy. That is the major problem in implementation.
Related Work(cont.) • Predictive Prefetching -- The prefetcher install in client, but communicates to a prediction engine ehich is part of web server. This engine tracks client request sequences and builds a dependency graph which contains probability information,the prefetcher can prefetch files with high probability. -- disadvantage: Requires specially designed protocol or modification to HTTP.
Related Work(cont.) • Prefetching Files System for WWW Servers -- It utilizes “referer” information contains in HTTP request message to build access probability graph. “Referer” is a header in HTTP request message, it indicates that the requested URL is linked from which URL. -- advantage: the response time can be reduced more than 20%. -- disadvantage: Not all requests contain this information and it takes time to accumulate enough data to build the graph.
Related Work(cont.) • Our approach -- Hybrid prefetcher that both parse HTML and build access probability graph. To make more intelligent prefetching, both access popularity and probability are considered.
3.1 Problem 1:How to find more documents that may be requested in the near future? • Prefetch by Parsing HTML -- It does not need information from past request history and can find related URLs even the request URL was never retrieved before. -- But ,it increase overhead of server,and increase the traffic
3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Referer -- Building “Referer link graph” -- The accumulated weight value of each node and edge can also be used to calculate access probability which is useful for prefetching. -- disad: Maintain the graph increase memory overhead and not all requests contain referer information.
3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Hybrid Prefetch -- If referer exist ,use referer to build “referer link graph” ,else pasing the HTML file to build the link graph. -- The HTML files require parsing are less than first approach, so the CPU overhead is smaller.
3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Directory -- Assumption: related documents are usually put in the same directory in the web server. -- If the directory structure of the web site does not agree with our assumption, the ratio of successful prefetchinf may be low.
3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested? • Popularity Constraint -- Building a table to track popularity of each requested document.The table is updated when new requested is coming. • Probability Constraint --
3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested?(cont.) • Combined Constraint -- Combination of both constraints by “OR” them. That is ,prefetch a document if it can pass either constraint.
4.Experiment Results Experiment A
Experiment B-Popularity Constraint(threshold) prefetch level=2 , cache size =10MB
5.Conclusion and Future Work • Hybrid prefetching technique, which is effective to imprpove hit rate of cache proxy and the accuracy of prediction is higher than other methods. • It can accomplish more than 70% cache hit rate and the increased traffic rate is below 40%. • Our experiments also show that separated caches is better than one common cache if total size is small.