460 likes | 831 Views
NoSQL Databases : MongoDB vs Cassandra . Introduction. What is a Database? “… a repository with organized and structured data, … “ ( Abramova & Bernardino, 2013-07) Data can be accessed using DBMS ( DataBase Management System) What is DBMS?
E N D
Introduction • What is a Database? • “… a repository with organized and structured data, … “ (Abramova & Bernardino, 2013-07) • Data can be accessed using DBMS (DataBase Management System) • What is DBMS? • “DBMS can be defined as a collection of mechanisms that enables storage, edit and extraction of data” (Abramova & Bernardino, 2013-07)
SQL • SQL: Structured Query Language • Became standard for: • Data interaction • Data manipulation • Data Stored as set of tables • Accessing data from different tables at the same time is possible.
NoSQL • Carlo Strozzi presented NoSQL in 1980, back then, it refers to an open source database that didn’t use SQL interface. • Carlo Strozzi preferred to call it “noseequel” or “NoRel” • Principle Difference • Popular after San Francisco conference held 2009 • Why do we need NoSQL? • In SQL ,efficiency in information extraction is affected by the growth of data stored & used
CAP theorem • Based from CAP theorem, the following guarantees can be defined: • Consistency • Availability • Partition tolerance • CAP theorem derives Relational and NoSQLprinciples
ACID • “ACID is a principle based on CAP theorem and used as set of rules for relational database transactions.“ (Abramova & Bernardino, 2013-07) • ACID guarantees: • Atomic • Consistent • Isolated • Durable • What if the amount of data is large? • ACID may be hard to accomplish!
BASE Principle & NoSQL • BASE principle: • Basically Available • Soft state • Eventually consistent • BASE still follows CAP theorem. • Two of the three guarantees should be selected if the system is distributed.
Types of NoSQLDatabases • More than 150 different NoSQLdatabases • Based on same principles • Has some different characteristics. • Categories: • Key-value Store • Document Store • Column-family • Graph database
Key-value store • Data is stored as a group of key and value • All keys are unique • Data Access is done by relating those keys to values • Hash contains all keys in order to provide information when needed
Document Store • Databases are defined as set of Key-value stores that gets transformed into documents. • Each document is identified by unique key • Data access can be done using: • key • specific value
Column Family • Similar to relational database model • Structure: • Column • Super-Column • Column family • Structure of database is defined by super-columns and column families. • Data access is accomplished by specifying column family, key and column in order to get value, using following structure: • <columnFamily>.<key>.<column> = <value>
Graph database • Those databases are used when data can be represented as graph, for example, social networks.
MONGODB • “MongoDB is an open source NoSQL database developed in C++” (Abramova & Bernardino, 2013-07). • MongoDBis a document store database • Documents are gathered into groups according to their structure • CAP theorem • Consistency • Partition tolerance
MONGODB (Cont.) • Description • Data is sent to disc every 60 seconds. • Everything is flushed to disc once new files are created • Each document is identified by “id” field • An index for the “id” field is created • Characteristics • Durability • Concurrency
MongoDB Characteristics • Durability • Durability of data is accomplished by the creation of replicas. • Master-Slave technique • Master: read & write • Slave: read • Slave with recent data becomes Master if the Master goes down • Replicas are asynchronous • Concurrency • Locks
CASSANDRA • “Cassandra is a NoSQL database developed by Apache Software Foundation; written in Java” (Abramova & Bernardino, 2013-07) • Similar to the usual relational model • Difference is that stored data can be: • semi structured • unstructured. • CAP theorem • Partition tolerance • High Availability • Designed to save large amount of data and deal with huge volumes in an efficient way.
CASSANDRA (Cont.) • Peer-to-peer architecture (NO MASTER) • High availability • High scalability • Replicates data over multiple nodes in a cluster. • Replication Factor: Total number of replicas. • RF(1): 1 copy of each row on 1 node • RF(2): 2 copies of same records on 2 nodes • Fail nodes are replaced with no downtime, and they are detected using “gossip” protocols
CASSANDRA (Cont.) • Replication Strategy: • Simple: single data center • Network Topology: multiple data centers • Cassandra Characteristics: • Durability: • Two replication types: • Synchronous • Asynchronous • All writes & redundancies are known using a commit log. • Indexing: • “Each node maintains the indexes of the table it manages” • Data is manipulated using CQL
YCSB • “The YCSB – Yahoo! Cloud Serving Benchmark is one of the most used benchmarks to test NoSQLdatabases” (Abramova & Bernardino, 2013-07). • YCSB has a client that consists of two parts: • Workload generator • Set of workloads. • Workloads are combinations of: • read • Write • update operations are done on randomly chosen records.
Workload A: 50%reads & 50% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 19
Workload b: 95% Reads & 5%updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
Workload C: 100% reads Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
Workload f: Read-Modify-Write Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
Workload G: 5% reads 95% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 20
Workload H: 100% updates Abramova, V., & Bernardino, J. (2013-07). NoSQL Databases: MongoDB vs Cassandra. 21