0% found this document useful (0 votes)

227 views26 pages

NoSQL Intro

This document provides an introduction and overview of NoSQL databases. It begins by defining NoSQL as a class of non-relational database management systems that do not use SQL for querying and that have flexible schemas. It discusses how NoSQL databases are scaled out rather than up. The document then covers some common NoSQL database types like key-value stores, column family stores, and document databases. It also discusses CAP theorem and BASE consistency compared to ACID transactions. Finally, it lists some examples of where NoSQL is used and provides a brief history.

Uploaded by

DuyNguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

227 views26 pages

NoSQL Intro

Uploaded by

DuyNguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Introduction to NoSQL

NoSQL Seminar 2012 @ TUT

Arto Salminen

What is NoSQL?
Class of database management systems (DBMS)
"Not only SQL"

Does not use SQL as querying language

Distributed, fault-tolerant architecture
No fixed schema (formally described structure)
No joins (typical in databases operated with SQL)
Expensive operation for combining records from
two or more tables into one set
Joins require strong consistency and fixed
schemas

Lack of these makes NoSQL databases more flexible

It's not a replacement for a RDBMS but compliments it

Database Scaling
RDBMS are "scaled up" by adding hardware
processing power
NoSQL is "scaled out" by spreading the load
Partitioning (sharding) / replication

Users
A-K

Users
L-O

App

Load
balancer

Users
P-S

Users
T-Z

Users
A-Z

Relational DB Scaling
At certain point relational database won't scale

NoSQL DB Scaling
Scaling horizontally is possible with NoSQL
Scaling up / down is easy
Supports rapid
production-ready
prototyping

Better handling of
traffic spikes

Where NoSQL Is Used?

Google (BigTable, LevelDB)

LinkedIn (Voldemort)
Facebook (Cassandra)
Twitter (Hadoop/Hbase, FlockDB, Cassandra)
Netflix (SimpleDB, Hadoop/HBase, Cassandra)
CERN (CouchDB)

History of NoSQL

MultiValue databases at TRW in 1965.

DBM is released by AT&T in 1979.
Lotus Domino released in 1989.
Carlo Strozzi used the term NoSQL in 1998 to name his lightweight,
open-source relational database that did not expose the standard SQL
interface.
Graph database Neo4j is started in 2000.
Google BigTable is started in 2004. Paper published in 2006.
CouchDB is started in 2005.
The research paper on Amazon Dynamo is released in 2007.
The document database MongoDB is started in 2007 as a part of a open
source cloud computing stack and first standalone release in 2009.
Facebooks open sources the Cassandra project in 2008.
Project Voldemort started in 2008.
The term NoSQL was reintroduced in early 2009.
Some NoSQL conferences
NoSQL Matters, NoSQL Now!, INOSA

CAP Theorem 1/2

It is impossible for a distributed computer
system to simultaneously provide all three of
the following guarantees:
Consistency (all nodes see the same data at the same
time)
Availability (a guarantee that every request receives a
response about whether it was successful or failed)
Partition tolerance (the system continues to operate
despite arbitrary message loss or failure of part of the
system)
A distributed system can satisfy any two of these guarantees
at the same time, but not all three.
Gilbert and Lynch, Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services, 2002

CAP Theorem 2/2

In other words, CAP can be expressed as "If
the network is broken, your database wont
work"
"won't work" = down OR inconsistent

In RDBMS we do not have P (network

partitions)
Consistency and Availability are achieved

In NoSQL we want to have P

Need to select either C or A
Drop A -> Accept waiting until data is consistent
Drop C -> Accept getting inconsistent data
sometimes

NoSQL Systems and CAP

http://blog.nahurst.com/visual-guide-to-nosql-systems

ACID vs BASE
Scalability and better performance of NoSQL is
achieved by sacrificing ACID compatibility.
Atomic, Consistent, Isolated, Durable
NoSQL is having BASE compatibility instead.
Basically Available, Soft state,
Eventual consistency

ACID -- Requirement for SQL DBs

Atomicity. All of the operations in the transaction
will complete, or none will.
Consistency. Transactions never observe or
result in inconsistent data.
Isolation. The transaction will behave as if it is the
only operation being performed upon the database
(i.e. uncommitted transactions are isolated)
Durability. Upon completion of the transaction, the
operation will not be reversed (i.e. committed
transactions are permanent)

BASE -- Basically Available

Use replication and sharding to reduce the
likelihood of data unavailability and use
sharding, or partitioning the data among
many different storage servers, to make any
remaining failures partial.
The result is a system that is always
available, even if subsets of the data
become unavailable for short periods of
time.

BASE and Availability

The availability of BASE is achieved through
supporting partial failures without total
system failure.
Example. If users are partitioned across five database
servers, BASE design encourages crafting operations in
such a way that a user database failure impacts only the
20 percent of the users on that particular host.
This leads to higher perceived availability of the
system. Even though a single node is failing, the
interface is still operational.

BASE -- Eventually Consistent

Although applications must deal with
instantaneous consistency, NoSQL systems
ensure that at some future point in time the
data assumes a consistent state.
In contrast to ACID systems that enforce
consistency at transaction commit, NoSQL
guarantees consistency only at some
undefined future time.
Where ACID is pessimistic and forces consistency at
the end of every operation, BASE is optimistic and
accepts that the database consistency will be in a
state of flux.

BASE and Consistency

As DB nodes are added while scaling up, need
for synchronization arises
If absolute consistency is required, nodes
need to communicate when read/write
operations are performed on a node
Consistency over availability -> bottleneck

As a trade-off, "eventual consistency" is used

Consistency is maintained later
Numerous approaches for keeping up "distributed
consistency" are available
Amazon Dynamo - consistent hashing
CouchDB - asynchronous master-master replication
MongoDB - auto-sharding+replication cluster with a master server

BASE -- Soft State

While ACID systems assume that data
consistency is a hard requirement, NoSQL
systems allow data to be inconsistent and
relegate designing around such
inconsistencies to application developers.
In other words, soft state indicates that the
state of the system may change over time,
even without input.
This is because of the eventual consistency model
(the acronym is a bit contrived).

Some breeds of NoSQL solutions

Key-Value Stores
Column Family Stores
Document Databases
Graph Databases
In addition: Object and RDF databases as
well as Tuple stores

Key-Value Stores
Dynamo, Voldemort, Rhino DHT ...

DeCandia et al. "Dynamo: Amazons Highly Available Key-value Store", 2007

Key-Value is based on a hash table where

there is a unique key and a pointer to a
particular item of data.
Mappings are usually accompanied by
cache mechanisms to maximize
performance.
API is typically simple -- implementation is
often complex.

Column Family Stores

BigTable, Cassandra, HBase, Hadoop ...

Chang et al. "Bigtable: A Distributed Storage System for Structured Data", 2006

Store and process very large amounts of

data distributed over many machines.
"Petabytes of data across thousands of servers"

Keys point to multiple columns

Age

Name

Gender

Phone

Jim

123456

Age

Name

Gender

Phone

Jill

654321

jim_87

jill_90

Cassandra example

Document Databases (Stores)

CouchDB, MongoDB, Lotus Notes, Redis ...
Documents are addressed in the database
via a unique key that represents that
document.
Semi-structured documents can be XML or
JSON formatted, for instance.
In addition to the key, documents can be
retrieved with queries.
Redis is sometimes referred to as data
structure server since keys can contain
strings, hashes, lists, sets and sorted sets.

Graph Databases
Neo4J, FlockDB, GraphBase, InfoGrip, ...
Graph Databases are built with nodes,
relationships between nodes (edges) and the
properties of nodes.
Nodes represent entities (e.g. "Bob" or "Alice").
Similar in nature to the objects as in object-oriented
programming.
Properties are pertinent information related to nodes (e.
g. age: 18).
Edges connect nodes to nodes or nodes to properties.
Represent the relationship between the two.

Scaling graph DBs is problematic

Neo4J: cache sharding, sharding strategy heuristics

Some NoSQL Challenges

Lack of maturity -- numerous solutions still in
their beta stages
Lack of commercial support for enterprise
users
Lack of support for data analysis
Maintenance efforts and skills are required -experts are hard to find

References and Material

Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat
Helland. 2007. The end of an architectural era: (it's time for a complete rewrite). InProceedings of the
33rd international conference on Very large data bases (VLDB '07). VLDB Endowment 1150-1160.
Werner Vogels. 2008. Eventually Consistent. Queue 6, 6 (October 2008), 14-19. DOI=10.
1145/1466443.1466448
Dan Pritchett. 2008. BASE: An Acid Alternative. Queue 6, 3 (May 2008), 48-55. DOI=10.1145/1394127.
1394128
Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services. SIGACT News 33, 2 (June 2002), 51-59. DOI=10.1145/564585.564601
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: a distributed storage system for
structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and
Implementation - Volume 7 (OSDI '06), Vol. 7. USENIX Association, Berkeley, CA, USA, 15-15.
Avinash Lakshman and Prashant Malik. 2010. Cassandra: a decentralized structured storage system.
SIGOPS Oper. Syst. Rev. 44, 2 (April 2010), 35-40. DOI=10.1145/1773912.1773922
Couch DB - The Definitive Guide, http://guide.couchdb.org/index.html
HP whitepaper: There is no free lunch with distributed data
Long list of NoSQL papers: http://nosqlsummer.org/papers

Possible Presentation Topics

NoSQL architectures
Key-Value Store
Graph
Big Table (Columnar)
Document Store
...
Implementations
MongoDB
HBase
Cassandra
CouchDB
Google App Engine Datastore
Hadoop
BigData
Redis
Riak
Neo4j

Hosted services
Freebase
OpenLink Virtuoso
Datastore on Google Appengine
Amazon DynamoDB
Cloudant Data Layer (CouchDB)
...
Technologies / misc
MapReduce
Fault tolerance
Taxonomy
Challenges / Limitations
Tools
Use Cases
Distributed databases
Parallel systems
...

Cassandra DBA
No ratings yet
Cassandra DBA
5 pages
HBase for Big Data Professionals
No ratings yet
HBase for Big Data Professionals
100 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Hive Using Hiveql
No ratings yet
Hive Using Hiveql
38 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
U-III MongoDB Intro
No ratings yet
U-III MongoDB Intro
109 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
10
No ratings yet
10
4 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
SPARQL & RDF: A Guide for Developers
No ratings yet
SPARQL & RDF: A Guide for Developers
39 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Big Data and Apache Spark Overview
No ratings yet
Big Data and Apache Spark Overview
211 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
No ratings yet
Hadoop Echosystem and Ibm Big Insights: Rafie Tarabay Eng - Rafie@Mans - Edu.Eg
112 pages
ADT
No ratings yet
ADT
34 pages
Hadoop Data Transfer with Sqoop
No ratings yet
Hadoop Data Transfer with Sqoop
21 pages
Database Systems Overview
No ratings yet
Database Systems Overview
12 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
PySpark RDD Assignment
No ratings yet
PySpark RDD Assignment
1 page
CH 23
No ratings yet
CH 23
126 pages
Apache Cassandra Sample Resume
No ratings yet
Apache Cassandra Sample Resume
17 pages
Bigdataaaaa
No ratings yet
Bigdataaaaa
180 pages
Parallel Distributed Architecture For Storage and Sharing (PDash)
No ratings yet
Parallel Distributed Architecture For Storage and Sharing (PDash)
6 pages
BigData Exam C2122 PDF
100% (1)
BigData Exam C2122 PDF
6 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
3 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
MapR Sandbox For Hadoop DocUpdateFor3.1.1
No ratings yet
MapR Sandbox For Hadoop DocUpdateFor3.1.1
7 pages
Unit 5 - Chapter 2 - Introduction To MongoDB
No ratings yet
Unit 5 - Chapter 2 - Introduction To MongoDB
53 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
Nosql PDF
No ratings yet
Nosql PDF
21 pages
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Mongo DB
No ratings yet
Mongo DB
31 pages
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
No ratings yet
My) SQL Cheat Sheet: Mysql Command-Line What How Example (S)
3 pages
Cassandra Quick Guide
No ratings yet
Cassandra Quick Guide
60 pages
Mapr Snapshots
No ratings yet
Mapr Snapshots
31 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
BCA 428 Oracle
No ratings yet
BCA 428 Oracle
142 pages
SQL-Transactions Theory and Hands-On Exercises
No ratings yet
SQL-Transactions Theory and Hands-On Exercises
85 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
33 pages
BigData Hadoop Notes
No ratings yet
BigData Hadoop Notes
101 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
Big Data & Hadoop Quiz
No ratings yet
Big Data & Hadoop Quiz
24 pages
ETL vs ELT: Key Differences Explained
No ratings yet
ETL vs ELT: Key Differences Explained
7 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
List of Experiments: Exp No. Title Prerequisite Course Outcome
No ratings yet
List of Experiments: Exp No. Title Prerequisite Course Outcome
58 pages
Cassandra: Decentralized Storage System
No ratings yet
Cassandra: Decentralized Storage System
37 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
2 - NoSQL
No ratings yet
2 - NoSQL
32 pages
No SQL
No ratings yet
No SQL
12 pages
Unit 4
No ratings yet
Unit 4
47 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts
9 pages
054 05593 0 At90sc
No ratings yet
054 05593 0 At90sc
2 pages
InTech-Real Time Robotic Hand Control Using Hand Gestures
No ratings yet
InTech-Real Time Robotic Hand Control Using Hand Gestures
16 pages
UFT Interview
No ratings yet
UFT Interview
24 pages
Application and o S Security
No ratings yet
Application and o S Security
18 pages
IBM Storwize 5000
No ratings yet
IBM Storwize 5000
6 pages
RMAN Point in Time Recovery (PITR) Scenario of A Dropped Oracle Tablespace
No ratings yet
RMAN Point in Time Recovery (PITR) Scenario of A Dropped Oracle Tablespace
2 pages
Database Lab: Stored Procedures
No ratings yet
Database Lab: Stored Procedures
11 pages
Cisco AAA Authentication and TACACS+ Guide
No ratings yet
Cisco AAA Authentication and TACACS+ Guide
3 pages
Log
No ratings yet
Log
5 pages
SAP System Architecture Guide
100% (1)
SAP System Architecture Guide
31 pages
Moon Chidrewar Yang Mobile OMR System
No ratings yet
Moon Chidrewar Yang Mobile OMR System
4 pages
Hotel Basic Network Configuration - PNP
No ratings yet
Hotel Basic Network Configuration - PNP
12 pages
Software Testing, Software Quality Assurance
100% (1)
Software Testing, Software Quality Assurance
127 pages
Academic & Research Profile
No ratings yet
Academic & Research Profile
4 pages
Smart Traffic Light System
No ratings yet
Smart Traffic Light System
12 pages
Regular Expressions: Pattern Matching Operators
No ratings yet
Regular Expressions: Pattern Matching Operators
0 pages
Understanding Slot-Filler Structures
No ratings yet
Understanding Slot-Filler Structures
33 pages
Chapter 5 - Business Software
No ratings yet
Chapter 5 - Business Software
49 pages
Understanding The Stack
No ratings yet
Understanding The Stack
119 pages
Macro Reference Guide
No ratings yet
Macro Reference Guide
48 pages
Securing Atm by Image Processing - Facial Recognition Authentication
No ratings yet
Securing Atm by Image Processing - Facial Recognition Authentication
4 pages
Div
No ratings yet
Div
70 pages
OTP Programmer 1
No ratings yet
OTP Programmer 1
14 pages
Check Point MFA With Google Authenticator 2
No ratings yet
Check Point MFA With Google Authenticator 2
42 pages
Segmentation 04
100% (1)
Segmentation 04
11 pages
SAS Developer - Sample Resume - CV
100% (7)
SAS Developer - Sample Resume - CV
2 pages
TF-MD-M1 Users Manual
No ratings yet
TF-MD-M1 Users Manual
18 pages
Handout of The Course - Sszg519
No ratings yet
Handout of The Course - Sszg519
3 pages
Software Engineer Resume
No ratings yet
Software Engineer Resume
2 pages

NoSQL Intro

Uploaded by

NoSQL Intro

Uploaded by

Introduction to NoSQL

NoSQL Seminar 2012 @ TUT

Does not use SQL as querying language

Lack of these makes NoSQL databases more flexible

It's not a replacement for a RDBMS but compliments it

Where NoSQL Is Used?

Google (BigTable, LevelDB)

MultiValue databases at TRW in 1965.

CAP Theorem 1/2

CAP Theorem 2/2

In RDBMS we do not have P (network

In NoSQL we want to have P

NoSQL Systems and CAP

ACID -- Requirement for SQL DBs

BASE -- Basically Available

BASE and Availability

BASE -- Eventually Consistent

BASE and Consistency

As a trade-off, "eventual consistency" is used

BASE -- Soft State

Some breeds of NoSQL solutions

DeCandia et al. "Dynamo: Amazons Highly Available Key-value Store", 2007

Key-Value is based on a hash table where

Column Family Stores

Store and process very large amounts of

Keys point to multiple columns

Document Databases (Stores)

Scaling graph DBs is problematic

Some NoSQL Challenges

References and Material

Possible Presentation Topics

You might also like