1 / 39

CSE 535 – Mobile Computing Lecture 8 : Data Dissemination

CSE 535 – Mobile Computing Lecture 8 : Data Dissemination. Sandeep K. S. Gupta School of Computing and Informatics Arizona State University. Data Dissemination. Communications Asymmetry. Network asymmetry In many cases, downlink bandwidth far exceeds uplink bandwidth

shira
Download Presentation

CSE 535 – Mobile Computing Lecture 8 : Data Dissemination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 535 – Mobile ComputingLecture 8:Data Dissemination Sandeep K. S. Gupta School of Computing and Informatics Arizona State University

  2. Data Dissemination

  3. Communications Asymmetry • Network asymmetry • In many cases, downlink bandwidth far exceeds uplink bandwidth • Client-to-server ratio • Large client population, few servers • Data volume • Small requests for info, large responses • Again, downlink bandwidth more important • Update-oriented communication • Updates likely affect a number of clients

  4. Disseminating Data to Wireless Hosts • Broadcast-oriented dissemination makes sense for many applications • Can be one-way or with feedback • Sports • Stock prices • New software releases (e.g., Netscape) • Chess matches • Music • Election Coverage • Weather…

  5. Dissemination: Pull • Pull-oriented dissemination can run into trouble when demand is extremely high • Web servers crash • Bandwidth is exhausted client client client server client client client help client

  6. Dissemination: Push • Server pushes data to clients • No need to ask for data • Ideal for broadcast-based media (wireless) client client client server client client client Whew! client

  7. 2 3 1 4 5 6 Broadcast Disks server Schedule of data blocks to be transmitted

  8. 2 1 3 2 1 1 4 1 5 3 6 1 Broadcast Disks: Scheduling Round Robin Schedule Priority Schedule

  9. Priority Scheduling (2) • Random • Randomize broadcast schedule • Broadcast "hotter" items more frequently • Periodic • Create a schedule that broadcasts hotter items more frequently… • …but schedule is fixed • "Broadcast Disks: Data Management…" paper uses this approach • Simplifying assumptions • Data is read-only • Schedule is computed and doesn't change… • Means access patterns are assumed the same Allows mobile hosts to sleep…

  10. "Broadcast Disks: Data Management…" • Order pages from "hottest" to coldest • Partition into ranges ("disks")—pages in a range have similar access probabilities • Choose broadcast frequency for each "disk" • Split each disk into "chunks" • maxchunks = LCM(relative frequencies) • numchunks(J) = maxchunks / relativefreq(J) • Broadcast program is then: for I = 0 to maxchunks - 1 for J = 1 to numdisks Broadcast( C(J, I mod numchunks(J) )

  11. Sample Schedule, From Paper Relative frequencies 4 2 1

  12. Broadcast Disks: Research Questions • From Vaidya: • How to determine the demand for various information items? • Given demand information, how to schedule broadcast? • What happens if there are transmission errors? • How should clients cache information? • User might want data item immediately after transmission…

  13. Hot For You Ain't Hot for Me • Hottest data items are not necessarily the ones most frequently accessed by a particularclient • Access patterns may have changed • Higher priority may be given to other clients • Might be the only client that considers this data important… • Thus: need to consider not only probability of access (standard caching), but also broadcast frequency • A bug in the soup: Hot items are more likely to be cached! (Reduce their frequency?)

  14. Broadcast Disks Paper: Caching • Under traditional caching schemes, usually want to cache "hottest" data • What to cache with broadcast disks? • Hottest? • Probably not—that data will come around soon! • Coldest? • Ummmm…not necessarily… • Cache data with access probability significantly higher than broadcast frequency

  15. Caching, Cont. • PIX algorithm (Acharya) • Eject the page from local cache with the smallest value of: probability of access broadcast frequency • Means that pages that are more frequently accessed may be ejected if they are expected to be broadcast frequently…

  16. Broadcast Disks: Issues • User profiles • Provide information about data needs of particular clients • "Back channel" for clients to inform server of needs • Either advise server of data needs… • …or provide "relevance feedback" • Dynamic broadcast • Changing data values introduces interesting consistency issues • If processes read values at different times, are the values the same? • Simply guarantee that data items within a particular broadcast period are identical?

  17. Hybrid Push/Pull • "Balancing Push and Pull for Data Broadcast" (Acharya, et al SIGMOD '97) • "Pull Bandwidth" (PullBW) – portion of bandwidth dedicated to pull-oriented requests from clients PullBW = 100% Schedule is totally request-based PullBW = 0% "pure" Push Clients needinga page simply wait

  18. Interleaved Push and Pull (IPP) • Mixes push and pull • Allows client to send requests to the server for missed (or absent) data items • Broadcast disk transmits program plus requested data items (interleaved) • Fixed threshold ThresPercto limit use of the back channel by a particular client • Sends a pull request for p only if # of slots before p will be broadcast is greater than ThresPerc • ThresPerc is a percentage of the cycle length • Also controls server load–as ThresPerc 100%, server is protected

  19. CSIM-based Simulation • Measured Client (MC) • Client whose performance is being measured • Virtual Client (VC) • Models the "rest" of the clients as a single entity… • …chewing up bandwidth, making requests… • Assumptions: • Front channel and back channel are independent • Broadcast program is static—no dynamic profiles • Data is read only

  20. Simulation (1) No feedback to clients!

  21. Simulation (2) • Can control ratio of VC to MC requests • Noisecontrols the similarity of the access patterns of VC and MC • Noise == 0  same access pattern • PIX algorithm is used to manage client cache • VC's access pattern is used to generate the broadcast (since VC represents a large population of clients) • Goal of simulation is to measure tradeoffs between push and pull under broadcast

  22. Simulation (3) • CacheSize pages are maintained in a local cache • SteadyStatePerc models the # of clients in the VC population that have “filled” caches—e.g., most important pages are in cache • ThinkTimeRatio models intensity of VC request generation relative to MC • ThinkTimeRatio high means more activity on the part of virtual clients

  23. Simulation (4)

  24. Experiment 1: Push vs. Pull At PullBW = 10%, reduction in bandwidth hurts push, is insufficient for pull requests! server death! Light loads: pull better Important! PullBW set at 50% in 3a – if server's pull queue fills, requests are dropped!

  25. Experiment 2: Cache Warmup Time for MC Low server load: pull better. High server load: push better.

  26. Experiment 3: Noise: Are you (VC) like me (MC)? !!! On the left, pure push vs. pure pull. On the right, pure push vs. IPP

  27. Experiment 4: Limiting Greed On the other hand… If there’s plenty of bandwidth, limiting greed isn’t a good idea

  28. Experiment 5: Incomplete Broadcasts Server overwhelmed—requests are being dropped! Not all pages broadcast—non-broadcast pages must be explicitly pulled Lesson: Must provide adequate bandwidth or response time will suffer! In 7b, making clients wait longer before requesting helps…

  29. Incomplete Broadcast: More Lesson: Careful! At high server loads with lots of pages not broadcast, IPP can be worse than push or pull!

  30. Experimental Conclusions • Light server load: pull better • Push provides a safety cushion in case a pull request is dropped, but only if all pages are broadcast • Limits on pull provide a safety cushion that prevents the server from being crushed • Broadcasting all pages can be wasteful • But must provide adequate bandwidth to pull omitted pages… • Otherwise, at high load, IPP can be worse than pull! • Overall: Push and pull tend to beat IPP in certain circumstances • But IPP tends to have reasonable performance over a wide variety of system loads… • Punchline: IPP a good compromise in a wide range of circumstances

  31. Mobile Caching: General Issues • Mobile user/application issues: • Data access pattern (reads? writes?) • Data update rate • Communication/access cost • Mobility pattern of the client • Connectivity characteristics • disconnection frequency • available bandwidth • Data freshness requirements of the user • Context dependence of the information

  32. Mobile Caching (2) • Research questions: • How can client-side latency be reduced? • How can consistency be maintained among all caches and the server(s)? • How can we ensure high data availability in the presence of frequent disconnections? • How can we achieve high energy/bandwidth efficiency? • How to determine the cost of a cache miss and how to incorporate this cost in the cache management scheme? • How to manage location-dependent data in the cache? • How to enable cooperation between multiple peer caches?

  33. Mobile Caching (3) • Cache organization issues: • Where do we cache? (client? proxy? service?) • How many levels of caching do we use (in the case of hierarchical caching architectures)? • What do we cache (i.e., when do we cache a data item and for how long)? • How do we invalidate cached items? • Who is responsible for invalidations? • What is the granularity at which the invalidation is done? • What data currency guarantees can the system provide to users? • What are the (real $$$) costs involved? How do we charge users? • What is the effect on query delay (response time) and system throughput (query completion rate)?

  34. Weak vs. Strong Consistency • Strong consistency • Value read is most current value in system • Invalidation on each write can expire outdated values • Disconnections may cause loss of invalidation messages • Can also poll on every access • Impossible to poll if disconnected! • Weak consistency • Value read may be “somewhat” out of date • TTL (time to live) associated with each value • Can combine TTL with polling • e.g., Background polling to update TTL or retrieval of new copy of data item if out of date

  35. Disconnected Operation • Disconnected operation is very desirable for mobile units • Idea: Attempt to cache/hoard data so that when disconnections occur, work (or play) can continue • Major issues: • What data items (files) do we hoard? • When and how often do we perform hoarding? • How do we deal with cache misses? • How do we reconcile the cached version of the data item with the version at the server?

  36. One Slide Case Study: Coda • Coda: file system developed at CMU that supports disconnected operation • Cache/hoard files and resolve needed updates upon reconnection • Replicate servers to improve availability • What data items (files) do we hoard? • User selects and prioritizes • Hoard walking ensures that cache contains the “most important” stuff • When and how often do we perform hoarding? • Often, when connected

  37. Coda (2) • (OK, two slides) • How do we deal with cache misses? • If disconnected, cannot • How do we reconcile the cached version of the data item with the version at the server? • When connection is possible, can check before updating • When disconnected, use local copies • Upon reconnection, resolve updates • If there are hard conflicts, user must intervene (e.g., it’s manual—requires a human brain) • Coda reduces the cost of checking items for consistency by grouping them into volumes • If a file within one of these groups is modified, then the volume is marked modified and individual files within can be checked

  38. WebExpress • Housel, B. C., Samaras, G., and Lindquist, D. B., “WebExpress: A Client/Intercept Based System for Optimizing Web Browsing in a Wireless Environment,” Mobile Networks and Applications 3:419–431, 1998. • System which intercepts web browsing, providing sophisticated caching and bandwidth saving optimizations for web activity in mobile environments • Major issues: • Disconnected operation • Verbosity of HTTP protocol  Perform Protocol Reduction • TCP connection setup time  Try to re-use TCP connections • Low bandwidth in wireless networks  Caching • Many responses from web servers are very similar to those seen previously  Use differencing rather than returning complete responses, particularly for CGI-based interactions

  39. WebExpress (2) One TCP connection Reinsert removed HTTP header info on server side Reduce redundant HTTP header info Caching on both client and on wired network + differencing Two intercepts: both on client side and in the wired network

More Related