220 likes | 227 Views
Challenges in Ubiquitous Data Management. Michael Franklin UC Berkeley August 2000. Ubiquitous Computing. “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.” Asilomar Rep. on DB Research, Dec. 1998 You’ve heard it before…
E N D
Challenges in Ubiquitous Data Management Michael Franklin UC Berkeley August 2000
Ubiquitous Computing • “In ten years, billions of people will be using the Web, but a trillion "gizmos" will also be connected to the Web.”Asilomar Rep. on DB Research, Dec. 1998 • You’ve heard it before… • Smartphones, PDAs, Smartcards, badges, wearables, lightswitches, toasters, … • Worldwide sales of Internet-enabled appliances projected to grow from 5.9M units in 1998 to 55.7M units in 2002. IDC via H&Q report M. Franklin – Aug. ‘00
Many people per computer One person per computer Many computers per person Ubiquitous Computers (Picture is by way of Randy Katz) Information Appliances More Scaled down PCs, desktop metaphor PC + Network Distribution WS/Server Time Sharing Batch RJE Less Less More Personalization M. Franklin – Aug. ‘00
Ubiquitous Connectivity • Tremendous improvements in Internet backbone bandwidth and reductions in diameter. • Broadband connectivity to the home and office (i.e. the “last mile”) is solved. • Wireless technologies are enabling anytime-anywhere connectivity. M. Franklin – Aug. ‘00
Ubiquitous Data Access • But, ubiquitous computing and connectivity aren’t worth much withoutubiquitous data access. • “Fundamentally, the ability to access all information from anywhere and have ONE unified and synchronized information repository is critical to making appliances useful.” Hambrecht and Quist, iWord , 3/99 • Ubiquitous data access will put existing data management techniques to the test, in all aspects – searching, location, reliability, consistency, … M. Franklin – Aug. ‘00
Ubiquitous Data: Past Accomplishments • Database Systems • Relational Model and extensions • Data Independence (physical and logical) • Declarative query processing and cost-based opt. • Storage structures/distribution/parallelism/… • Transactions – a comprehensive model for concurrency and fault tolerance. • Information Retrieval • More natural query interfaces • User interaction/feedback is designed in • Tolerance of ambiguity due to natural language and unstructured data. M. Franklin – Aug. ‘00
Ubiquitous Data – State of the Art • Everyone uses a database system and/or search engine every day Although they may not realize it!(the true test of “ubiquity”). • The Internet and WWW have become a ubiquitous means of global data dissemination and exchange. • Databases play a crucial but largely invisible role here. • XML and related standards are enabling increasingly sophisticated interoperation. • Wireless access provides anytime-anywhere access and enables location-centric applications. M. Franklin – Aug. ‘00
Where it’s heading • TV/Phone/Internet/etc convergence. • Mobility and user context-sensitive applications. • Global utility-oriented infrastructure. • Data streams (broadcast and otherwise). • Alerters (agents?) and context-sensitive delivery. • “In the future, the main bottleneck will be human attention.” But how far in the future? M. Franklin – Aug. ‘00
A True Paradigm Shift • Data management research in the 80’s and 90’s was all about “ilities” (or “alities”): • functionality • scalability • serializability • optimality • interoperability M. Franklin – Aug. ‘00
Paradigm Shift (continued) • In the world of ubiquitous data access we need to shift from “ilities” to “ations”! • functionality • scalability • serializability • optimality • interoperability personalization globalization synchronization flow regulation integration M. Franklin – Aug. ‘00
1) Personalization • Filtering the data flood • There’s too much information out there • Systems will have to help people find what they need • Systems will actively suggest information and sites based on user’s interests. • Data delivery must be made context-aware • Location-centric applications • Task, and role-sensitive delivery • The key technology is User Profiles M. Franklin – Aug. ‘00
Example: “Data Recharging” Profiles • Three main components: 1) Content-based specifications of user interests (read “queries”) 2) Specifications of user priorities/requirements priority ordering, resolution, freshness, dependencies 3) User Context information – where, when, who, what • This info is available in the user’s PIM data! • Profiles must be both specified explicitly and learned automatically. M. Franklin – Aug. ‘00
2) Globalization • Universal connectivity + cheap storage enables new solutions for availability and durability: • Large-scale and dynamic replication • Fault/disaster tolerance through RAID-like techniques. • Security is a fundamental open problem here. • Archival storage: capture all data,voice,video,programs • How to find anything? • Formats become obsolete – how to carry them forward? • How to ensure a realistic reproduction? e.g. old video games or web surfing circa 2000 M. Franklin – Aug. ‘00
Example: Berkeley’s OceanStore (Picture is from J. Kubiatowicz) Canadian OceanStore • Based on a global storage “utility” model • Think of a safe and principled Gnutella Sprint AT&T IBM Pac Bell IBM M. Franklin – Aug. ‘00
3) Synchronization • Many different types of data • Enterprise – Inventory, ERP, … • Web content – Stock quotes, news, weather, other • Personal data – calendar, contacts, email, … • All have different requirements for consistency. • Traditional notion of ACID transaction semantics not appropriate for most of these. M. Franklin – Aug. ‘00
Synchronization (continued) • Other problems with transactional approaches: • Scale – # of devices, data size, etc. • Varying degrees of connectivity. • Lack of clear transactional boundaries (e.g. continuous queries) • “Closed world” assumption is inappropriate. • Data spans multiple administrative domains • Interactive systems – “User in the loop” • Some alternatives: • Data Dissemination, Data Recharging, Explicit user-directed synchronization, Epidemic Algorithms M. Franklin – Aug. ‘00
Example: Epidemic Protocol (Picture is by way of Ugur Cetintemel) Conflict? M. Franklin – Aug. ‘00
4) Data Flow Regulation • Pervasive network connectivity enables global-scale federated DBMSs. • Improvements in heterogeneous DBMS and emerging standards enable Internet query processing. • Users and data are increasingly mobile. • Continuous Data streams from sensors, stock tickers, updates to web sites, etc. M. Franklin – Aug. ‘00
Why Standard DBMS Won’t Work • Can’t deal with arbitrary services • Can’t adapt while running • need a “continuous” query optimizer • need to handle midstream failover or redirection. • Reload, alternate sites • Uses the wrong Query Processing algorithms • Can’t produce incremental results but data stream never ends! • Can’t understand cost/quality tradeoffs • maybe I’d settle for something less if it went faster M. Franklin – Aug. ‘00
static plans continuous opt. late binding anarchy reopt. Adaptive Approaches • Increased uncertainty argues for increased adaptivity. • Wide-area nets and admin domains introduce uncertainty. • Pesky users introduce uncertainty. • Mobility and streams introduce uncertainty. • The Telegraph project at Berkeley is building an adaptive data flow processing infrastructure using radically new techniques (almost, but not quite anarchy…). Dynamic, Parametric, Competitive, … Query Scrambling Eddy ??? current DBMS XJoin M. Franklin – Aug. ‘00
5) Interoperation • Data and application integration is still a difficult problem. • People have realized that there is no silver bullet. • The Internet has made the tough work required to do this integration seem more worthwhile. • XML and its related standardization efforts provide the basic plumbing for large-scale interoperation. • The key is to develop a more flexible and evolving approach. M. Franklin – Aug. ‘00
Conculsions • Ubiquitous Data Access is real • UDA challenges all aspects of existing data management technology. • We need to build systems to protect humans from the data flood, but good old systems performance issues still matter. • What is the killer app for Ubiqutious Data Access? • Most existing examples are • boring (replay TV) • silly (business meetings in the park) • or irritating (buy milk now!!!) M. Franklin – Aug. ‘00