180 likes | 355 Views
What’s a Web Cache? Why do people use them?. Web cache location Web cache purpose There are two main reasons that Web cache are used: to reduce latency – make sites seem more responsive to reduce traffic – keeps bandwidth requirements lower. ISP Network. Internet. an expensive
E N D
What’s a Web Cache?Why do people use them? • Web cache location • Web cache purpose There are two main reasons that Web cache are used: • to reduce latency – make sites seem more responsive • to reduce traffic – keeps bandwidth requirements lower
ISP Network Internet an expensive Link to I-net End user, served by the ISP cache Kinds of Web Caches • Browser Caching • the browser cache uses simple rule: it checks to make sure that an object is fresh only once during the browser lifetime • It is very useful when the “Back” button is used the browser • Proxy Caching Usually operated by ISPs to save bandwidth
Aren’t Web Caches bad for me? Drawbacks: • Web Caching is one of the most misunderstood technologies on the Internet. • Caches can serve content that is out of date or stale. Advantages: • Fast-loading sites. • Replicated content for FREE! Bear in mind that caches will be used whether you like it or not.
How Web Caches Work? The most common rules: • Do what the object’s header tells you to do. • If the object is authenticated or secure – don’t cache • Freshness: • Expire time or other age-controlling directive • If a browser cache has already seen the object, and has been set to check once a session. • If a proxy cache has seen the object recently, and it was modified relatively long ago. • Stale objects will be revalidated.
Cache Control • HTML Meta Tags vs. HTTP Headers • Meta tags are easy to use, but aren't very effective. • HTTP Headers give a lot of control HTTP/1.1 200 OK Date: Fri, 30 Oct 1998 13:19:41 GMT Server: Apache/1.3.3 (Unix) Cache-Control: max-age=3600, must-revalidate Expires: Fri, 30 Oct 1998 14:19:41 GMT Last-Modified: Mon, 29 Jun 1998 02:28:12 GMT ETag: "3e86-410-3596fbbc“ Content-Length: 1040 Content-Type: text/html The HTML document would follow these headers, separated by a blank line.
Controlling Freshness Expires HTTP Header • The Expires HTTP header is the basic means of controlling caches . • Expires: Fri, 30 Oct 1998 14:19:41 GMT • Most Web servers allows to set this field in several ways: • An absolute time • Last modification time • Last access time GMT = Greenwich Mean Time.
Controlling Freshness (cont.) Cache Control HTTP Header • Introduced in HTTP 1.1. Interesting Cache-Control response headers: • max-age=[seconds] • s-maxage=[seconds] • public • no-cache • must-revalidate • proxy-revalidate
Validation • One the main issues in caching is “validation”, namely the process by which the cache verifies that a cached object is still valid. • To this end, HTTP has in its header validators • A validator is used in order to find out whether the cached object is an equivalent copy of the same object at the original server. • The most common validator is the time that the document last changed, using the Last-Modified field. • A request received by a cache server can be classified in to the following three categories: • A “miss” namely the page is not in the cache • A “hit” that requires a validation, namely the object is found by the server but must be validated. • A “hit” that does not require a validation.
Validation (cont.) • HTTP 1.1 adds “validation mechanism” to increase caching efficiency: • When a server sends a document, it attaches a validator called ETag. • An ETag is a unique identifier generated by the server, and changed whenever the object does. • After the expiration date, the proxy generates a conditional request with a cache validator attached to it. • The server than evaluates the message and responds with a “not-modified” or with the full document.
Frequently Asked Questions • What are the most important things to make cacheable? • How can I make my pages as fast as possible with caches? • I've got a page that is updated often. How do I keep caches from giving my users a stale copy? • My pages are password-protected; how do proxy caches deal with them?
Proxy Cache Servers • What is Proxy Server • Why put a cache on the proxy server • What is transparent proxy server • Main advantage: the user does not have to cooperate • Proxy Auto Configuration (PAC) • Combines the advantages of manual proxy configuration and transparent proxy.
Cooperative Caching • The idea is that Web caches located at different places will cooperate in order to improve overall performance. • There exist several protocols for cooperating caching: • The Internet Caching Protocol (ICP): • Serves mainly as “object-location protocols” • The Cache Array Routing Protocol (CARP) • Instead of performing queries, CARP uses a hash-based routing to provide a deterministic “request resolution path” through an array of proxies. • While ICP uses its own messages, CARP uses HTTP messages. • Browser can participate in this protocol using “proxy auto configuration” (PAC).
ICP • ICP allows a cache server to query other servers for an object • It may get an ICP hit message • An ICP miss message • Or no response within a time-out period • Based on the query result, the server determines how to continue • E.g. if the item is not found, the cache may query another server • Or may get the object from the original server • If the item is found, the cache sends an HTTP request to the other cache
Example: Hierarchical caching using ICP Original server 5) HTTP Request 4) ICP 4) ICP Level 2 3) HTTP Request 2) ICP 2) ICP Level 1 1) HTTP Request browser
Content Distribution Networks • A content distribution network (CDN) can be viewed as a global Web server replication. • Main idea: each replica is located in different geographic area, rather than in the same server farm. • A CDN usually consists of the following components: • A set of Web servers and / or cache servers • A dedicated intelligent distribution mechanism to move data between the various servers • A mechanism to intelligently match the requesting user with the most efficient server. • Main issue of the CDN: • How to synchronize changes such that the same request to two different replicas at the same time will get the same response.
A CDN Example www.nytimes.com (NY) Isp-1 (London) Isp-2 (LA) Internet www.cnn.com (Atlanta)
Reverse Caching • Reverse caching is another term for placing a cache in front of a Web server or e-commerce application. • This is called "reverse" because it is implemented by the administrators of the Web servers, rather than by the clients, to cache or distribute content from the servers or to offload processing. • In reverse caching, the cache server not only stores pages from the Internet for the benefit of local users, but it also stores local pages for the benefit of Internet users.
Reverse Caching (cont.) • Forward caching technology stores downloaded internet content for reuse within a given user group, such as employees, while reverse caching actively pushes out a company's web content to a diverse group of remote users, such as customers, thereby improving user response time. • Reverse caching is gaining in popularity with ecommerce firms because pushing web pages to customers ensures a fast service, even if those customers do not use caching technology themselves.