Hybrid Prefetching for WWW Proxy Servers

Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng , Wen-Jou Lin , Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan, R.O.C International Conference on Parallel and Distributed Systems,1998 Mikt Tien Miketien@syslab.cse.yzu.edu.tw Syslab Yan Zen

Outline • 1.Introduction • 2.Related work • 3.Prefetching Mechanism • 4.Experiment Result • 5.Conclusion and Future Work

1.Introduction • Depend on the location of cache,We can classify cache into three types: client cache,server cache,proxy cache • Some studies show that, the maximum possible hit rate of a proxy cache is about 30%-50%.To overcome  prefetch is clear solution • So we classify prefetcher into three types: client prefetcher,server prefetcher,proxy prefetcher • Client Prefetcher can analyze personal requests to predict future request, proxy prefetcher can gather information from multi-client to multi-server.

2.Related Work • Interactive Prefetching proxy Server(Wcol) (Content Parsing) -- To get linked documents by parsing HTML pages(include images). -- advantage: Hit rate of the cache is more than 60% -- disadvantage: the traffic is 4.12 times larger than a normal caching proxy and task to parse HTML also adds overhead to the server..

Related Work(cont.) • Top-10 Approach --Requires cooperation between web server,proxy and client browser. The higher level servers know the popular documents to their lower level clients. -- advantage: Hit rate more than 40% and increase traffic is no more than 10% in most case. -- disadvantage: In order to achieve good prediction, every proxies and servers need to follow the same policy. That is the major problem in implementation.

Related Work(cont.) • Predictive Prefetching -- The prefetcher install in client, but communicates to a prediction engine ehich is part of web server. This engine tracks client request sequences and builds a dependency graph which contains probability information,the prefetcher can prefetch files with high probability. -- disadvantage: Requires specially designed protocol or modification to HTTP.

Related Work(cont.) • Prefetching Files System for WWW Servers -- It utilizes “referer” information contains in HTTP request message to build access probability graph. “Referer” is a header in HTTP request message, it indicates that the requested URL is linked from which URL. -- advantage: the response time can be reduced more than 20%. -- disadvantage: Not all requests contain this information and it takes time to accumulate enough data to build the graph.

Related Work(cont.) • Our approach -- Hybrid prefetcher that both parse HTML and build access probability graph. To make more intelligent prefetching, both access popularity and probability are considered.

3.Prefetching Mechanism

3.1 Problem 1:How to find more documents that may be requested in the near future? • Prefetch by Parsing HTML -- It does not need information from past request history and can find related URLs even the request URL was never retrieved before. -- But ,it increase overhead of server,and increase the traffic

3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Referer -- Building “Referer link graph” -- The accumulated weight value of each node and edge can also be used to calculate access probability which is useful for prefetching. -- disad: Maintain the graph increase memory overhead and not all requests contain referer information.

3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Hybrid Prefetch -- If referer exist ,use referer to build “referer link graph” ,else pasing the HTML file to build the link graph. -- The HTML files require parsing are less than first approach, so the CPU overhead is smaller.

3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Directory -- Assumption: related documents are usually put in the same directory in the web server. -- If the directory structure of the web site does not agree with our assumption, the ratio of successful prefetchinf may be low.

3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested? • Popularity Constraint -- Building a table to track popularity of each requested document.The table is updated when new requested is coming. • Probability Constraint --

3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested?(cont.) • Combined Constraint -- Combination of both constraints by “OR” them. That is ,prefetch a document if it can pass either constraint.

4.Experiment Results Experiment A

Experiment B-Popularity Constraint(threshold) prefetch level=2 , cache size =10MB

Experiment B—Probability Constraint

5.Conclusion and Future Work • Hybrid prefetching technique, which is effective to imprpove hit rate of cache proxy and the accuracy of prediction is higher than other methods. • It can accomplish more than 70% cache hit rate and the increased traffic rate is below 40%. • Our experiments also show that separated caches is better than one common cache if total size is small.

Hybrid Prefetching for WWW Proxy Servers

Hybrid Prefetching for WWW Proxy Servers

Presentation Transcript

Comparison of SIP Proxy and Redirect Servers

Working with Proxy Servers and Application-Level Firewalls

Cooperative Multimedia Proxy Servers

Prefetching for RC

Prefetching

Proxy Servers

WWW servers and search engines

Hierarchical Caching and Prefetching for Continuous Media Servers with Smart Disks

Shibboleth protected proxy servers

Mining Web Logs for Prediction Models in WWW Cashing and Prefetching

Performance Issues in WWW Servers

Cost-Aware WWW Proxy Caching Algorithms

Prefetching Techniques

Benefits of Proxy Servers to SEO Industry

The strategy for Free Proxy Servers as well as Software

Different types or levels of proxy servers

Proxy servers in CERN-CC

The essential of Proxy Servers