200 likes | 358 Views
About Me. Joshua Silver 4th year CS major – graduating in May Specialization: Databases Interests: The business side of computing … and no, not IT How can companies use technology to improve and enable their business
E N D
About Me Joshua Silver • 4th year CS major – graduating in May • Specialization:Databases • Interests: • The business side of computing … and no, not IT • How can companies use technology to improve and enable their business • Think Enterprise Web 2.0, mobile strategies, viral promotion on the internet, Netflix recommendation engine, e-commerce, etc. etc. • Startups!
Sleepers & Workaholics Caching Strategies in Mobile Computing Authors: Dr. Daniel Barbará and Dr. Tomasz Imielinski Presented by: Joshua Silver, Fall 2008
Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará • Professor at George Mason University • Several patents associated with mobile caching Dr. Tomasz Imielinski • Professor at Rutgers University • Senior VP: Search Technology at Ask.com
The Big Picture Problem • Wireless devices have limited bandwidth, limited storage, and limited battery life • To save power, devices go offline • Mobile devices appear randomly in new cells • Makes data caching difficult since server can’t track client caches
Then and now • Paper written in 1994 • Devices, bandwidth, battery limitations are different • Essential problem still exists
With an explosion of wireless devices, the problem is even greater 24 Million in 1994 >240 Million in 2008 … and that doesn’t even take into account proprietary handheld units (like UPS driver delivery computers , Amazon Kindles, grocery store handheld scanners, etc.) Source: CTIA—The Wireless Association. http://www.infoplease.com/ipa/A0933563.html
Why Caching is Important Conserve: • Computational resources • Battery life • Network bandwidth Can’t store entire dataset on handheld. -US maps on GPS unit -Delivery routes for UPS drivers -Contact list on Blackberry
Traditional Strategies Fail In a traditional client-server model: • the server keeps track of client caches • pushes only the changes/sends cache invalidation messages BUT…. Server lacks knowledge of: • Which units are in its cell • Which units are powered ON Quintessential problem: Client caches in a mobile environment cannot be tracked by a server
The Solution Purpose: "…to propose a taxonomy of different cache invalidation strategies and study the impact of clients' disconnection times on their performance." Sleepers & Workaholics proposes a few solutions and evaluates their effectiveness with mathematical rigor
Evaluation Criteria Complicated math! …. The paper’s appendices have details. Essentially: Define two types of Mobile Units • Sleepers (offline/off all the time) • Workaholics (never go offline) • Almost all real world devices fall in between How do you compare? Normalize by defining “hit ratio” since it affects overall throughput
Strategies to Evaluate Proposed Strategies: • Timestamps (TS) • Amnesic Terminals (AT) (only remembering part – like amnesia) • Signatures (SIG) Control Strategy: • No Cache (NC)
Timestamps -Each cache entry has a timestamp -Synchronous, history based, uncompressed in nature SERVER: Communicates with clients every n seconds (and retries until successfully connected) Sends a list of items and their associated timestamps (to accommodate for potential delay in transmission) CLIENT: For each item in cache: • If entry is in received report from server, purge from cache • If NOT in report, simply update timestamp to current time
Amnesic Terminals -Each cache entry has a identifier -ALSO Synchronous, history based, uncompressed in nature SERVER: Notify clients of identifiers of items changed since the last invalidation report. CLIENT: For each item in cache: • If in report, purge from cache • If NOT in report, do nothing • ALSO, if enough time has elapsed, drop WHOLE cache and rebuild completely.
Signatures -Checksums calculated over value of data to form Signature -Since the mobile unit does not have entire database, need an algorithm to compute a partial checksum – see the appendix -Signatures combined using XOR -Synchronous, state based, compressed reports SERVER: Server broadcasts the set of combined signatures CLIENT: Item in cache is declared invalid if it belongs to “too many” unmatching signatures (suspected of being out of date)
No Cache There is no cache SERVER: Responds to direct queries from the client with appropriate information CLIENT: Query the database directly anytime item is needed
Conclusions on Effectiveness Strategy depends on circumstances: • Signatures best for long sleepers, when the disconnection period is long and difficult to predict • Timestamps best for query-intensive scenarios, when the rate of queries is greater than the rate of updates, provided that units are not workaholics • Amnesiac Terminals is best for workaholics, units that are awake most of the time
Still not satisfied …. how can we improve effectiveness? Only 2 options: 1. Update less often or 2. Send less info
Relax the Consistencyof the Cache Depending on data type, data may not need to be exact… EX: stocks, weather, etc. Allow to vary by a set tolerance (like .05% for stock prices, outdated weather reports by 2 hours, etc) Makes shorter invalidation reports possible
How Do We Decide to Update? - Consider cached copies to be quasi-copies - Each quasi-copy has a coherency condition attached to it Coherency Conditions: Delay Condition - updated based on time Arithmetic Condition - updated based on differencebetween data and quasi-copy
Criticism • Which resources are most scarce is not really still accurate (eg. bandwidth better than predicted, longer battery life) • Units rarely powered down • Battery life better than predicted • Battery life does not dictate use patterns … reception does also • Units still lose reception frequently • Today’s most common “sleeper” condition -- explicitly excluded from definition in S&W