300 likes | 584 Views
Dynamo, Amazon’s NoSQL Database. Bogdan Ghidireac amazon.com. NoSQL Databases Dynamo Architecture. NoSQL Databases Dynamo Architecture. Document. ID01 -> { "glossary": { "title": "example glossary", " GlossDiv ": {
E N D
Dynamo, Amazon’s NoSQL Database Bogdan Ghidireac amazon.com
NoSQL Databases Dynamo Architecture
NoSQL Databases Dynamo Architecture
Document ID01 -> { "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Standard Generalized Markup Language", "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-markup language, used to create markup languages such as DocBook.", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } }
Key-Value ID01 -> readme.txt ID02 -> big_picture.png ID03 -> book.doc
Column ID101 -> { ProductName = "Book 101 Title“ ISBN = "111-1111111111" Authors = [ "Author 1", "Author 2" ] Price = -2 Dimensions = "8.5 x 11.0 x 0.5" PageCount = 500 InPublication = 1 ProductCategory = "Book" } ID201 -> { ProductName = "18-Bicycle 201" Description = "201 description" BicycleType = "Road" Brand = "Brand-Company A" Price = 100 Gender = "M" Color = [ "Red", "Black" ] ProductCategory = "Bike" }
NoSQL Databases Dynamo Architecture
Motivation • Highly available storage for shopping cart • Existing Oracle solution could not scale • 99.99% availability
Key Principles • Keep It Simple • Replication is necessary for high availability • Symmetry: nobody’s special • Decentralization: favor peer-to-peer techniques over centralized control
Consistency vs. Availability • There’s a fundamental tension between consistency and availability • Formalized by the “CAP Dilemma”: can’t simultaneously achieve strong consistency and availability in the presence of network partitions (Brewer, 1998) • For the highest availability, have to be willing to sacrifice consistency • Our design embraces this consistency tradeoff as a first principle
Dynamo: Replicated DHT with Consistency Management • Consistent hashing • Optimistic replication • “Sloppy quorum” • Anti-entropy mechanisms • Object versioning
Load Balancing Partitioning and Replication h(key1) 2128 0 N=3 B h(key2) A C F E D
Load Balancing 2128 0 B B B B A A A A C C C C D D D D
“Sloppy Quorum” • Configurable N, R, W • N replicas in ideal state • Successful read involves at least R nodes • Successful write involves at least W nodes • Sloppy Quorum: dynamic membership based on node availability • “Always Writable” with tunable probability of “Read Your Writes” consistency
“Sloppy Quorum” h(key1) 2128 0 N=3 R=2 W=2 = success put(key1,v1) put(key1,v2) get(key1) = v1 local read local write key1= v2 key1= v1 B B success success forwarded writes forwarded reads forwarded writes key1= v1 key1= v2 A C E F anti-entropy key1= v1 key1= v2 D
Consistency Management put(v4) based on [v2,v3] get() -> [v2,v3] put(v3) based on v1 put(v1) put(v2) based on v1 get() -> v1 • Each put() creates new, immutable version • Dynamo tracks version history • Automatic reconciliation • Application-level reconciliation • System Interfaces • put(key, object, context) • get(key) -> object[], context v1 v1 v1 v2 v2 B B v3 v3 v3 v4 v4 v4 Version History A v1 v2 v3 F F v4 D
Conclusion • NoSQL databases are specialized storages created to solve scalability and availability problems found in relational systems. • Quorum, versioning, and consistent hashing techniques can be combined to yield a highly available system with user-perceived consistency.