Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views21 pages

05 NoSQL

NoSQL is a movement towards non-relational databases designed for handling Big Data, emphasizing scalability and flexibility in data storage and retrieval. It includes various types such as key-value stores, document databases, column-family stores, and graph databases, with MongoDB being a prominent example of a document database. Key features of NoSQL databases include support for semi-structured data, sharding, replication, and the use of JSON or BSON for data representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views21 pages

05 NoSQL

NoSQL is a movement towards non-relational databases designed for handling Big Data, emphasizing scalability and flexibility in data storage and retrieval. It includes various types such as key-value stores, document databases, column-family stores, and graph databases, with MongoDB being a prominent example of a document database. Key features of NoSQL databases include support for semi-structured data, sharding, replication, and the use of JSON or BSON for data representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

NoSQL

1
Big Data (some old numbers)
• Facebook:
 130TB/day: user logs
 200-400TB/day: 83 million pictures

• Google: > 25 PB/day processed data

• Gene sequencing: 100M kilobases


per day per machine
 Sequence 1 human cell costs Illumina $1k
 Sequence 1 cell for every infant
 10 trillion cells / human body

• Total data created in 2010: 1.ZettaByte


(1,000,000 PB)/year
 ~60% increase every year
2
Big data is not only databases
• Big data is more about data analytics and
on-line querying

Many components:
• Storage systems
• Database systems
• Data mining and statistical algorithms
• Visualization

3
What is NoSQL?
• An emerging “movement” around
non-relational software for Big Data
• Roots are in the Google and Amazon homegrown
software stacks

• Wikipedia: “A NoSQL database provides a mechanism for storage


and retrieval of data that use looser consistency models than
traditional relational databases in order to achieve
horizontal scaling and higher availability.
• Some authors refer to them as "Not only SQL" to emphasize that
some NoSQL systems do allow SQL-like query language to be
used.”
Some NoSQL Components

Analytics Interface Imperative Lang


(Pig, Hive, …) (RoR, Java,Scala, …)

Data Parallel Processing


(MapReduce/Hadoop)
Distributed Key/Value or Column
Store
(Cassandra, Hbase, Redis, …)
Scalable File System
(GFS, HDFS, …)

5
NoSQL features
• Scalability is crucial!
 load increased rapidly for many applications
• Large servers are expensive

• Solution: use clusters of small


commodity machines
 need to partition the data and use
replication (sharding)
 cheap (usually open source!)
 cloud-based storage

6
NoSQL features
• Sometimes not a well defined schema

• Allow for semi-structured data


 still need to provide ways to query
efficiently
(use of index methods)
 need to express specific types of queries
easily

7
Flavors of NoSQL

Four main types:


• key-value stores
• document databases
• column-family (big-table) stores
• graph databases

=>Here we will talk more about


“Document” databases (MongoDB)

10
Key-Value Stores

There are many systems like that: Redis,


MemcacheDB, Amazon's DynamoDB,
Voldemort

• Simple data model: key/value pairs


• the DBMS does not attempt to interpret the
value

• Queries are limited to query by key


• get/put/update/delete a key/value pair
• iterate over key/value pairs

11
Document Databases
Examples include: MongoDB, CouchDB, Terrastore

• Also store key/value pairs


- However, the value is a document.
• expressed using some sort of semi-structured data model
• XML
• more often: JSON or BSON (JSON's binary counterpart)
• the value can be examined and used by the DBMS (unlike
in key/ data stores)
• Queries can be based on the key (as in key/value
stores), but more often they are based on the
contents of the document.

• Here again, there is support for sharding and


replication.
• the sharding can be based on values within the
document 12
The Structure Spectrum

Structured Semi- Unstructure


(schema- Structured d (schema-
first) (schema-later) never)

Relational Documents Plain Text


Database XML
Media
Formatted Tagged
Messages Text/Media
MongoDB (An example of a
Document Database)
-Data are organized in collections. A collection
stores a set of documents.
- Collection like table and document like
record
- but: each document can have a different set
of attributes even in the same collection
- Semi-structured schema!
- Only requirement: every document should
have an “_id” field

14
Example mongodb

{ "_id”:ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": ” Cousteau",
"First Name": ” Jacques-Yves",
"Date of Birth": ”06-1-1910" },

{ "_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"Last Name": "PELLERIN",
"First Name": "Franck",
"Date of Birth": "09-19-1983",
"Address": "1 chemin des Loges",
"City": "VERSAILLES" }

15
Example Document Database:
MongoDB
Key features include:
• JSON-style documents
• actually uses BSON (JSON's binary
format)
• replication for high availability
• auto-sharding for scalability
• document-based queries
• can create an index on any attribute
• for faster reads

16
MongoDB Terminology
relational term <== >MongoDB equivalent
----------------------------------------------------------
database <== > database
table <== > collection
row <== > document
attributes <== > fields (field-name:value pairs)
primary key <== > the _id field, which is the key
associated with the document

17
JSON
• JSON is an alternative data model for
semi-structured data.
• JavaScript Object Notation

• Built on two key structures:


• an object, which is a sequence of name/value pairs
{ ”_id": "1000",
"name": "Sanders Theatre",
"capacity": 1000 }
• an array of values [ "123", "222", "333" ]
• A value can be:
• an atomic value: string, number, true,
false, null
• an object
• an array
18
The _id Field
Every MongoDB document must have an _id
field.
• its value must be unique within the
collection
• acts as the primary key of the collection
• it is the key in the key/value pair
• If you create a document without an _id field:
• MongoDB adds the field for you
• assigns it a unique BSON ObjectID
• example from the MongoDB shell:
> db.test.save({ rating: "PG-13" })
> db.test.find()
{ "_id" :ObjectId("528bf38ce6d3df97b49a0569"), "rating" : "PG-
13" }
19
Data Modeling in MongoDB
Need to determine how to map
entities and relationships => collections of
documents
• Could in theory give each type of entity:
• its own (flexibly formatted) type of document
• those documents would be stored in the same
collection
• However, it can make sense to group different
types of entities together.
• create an aggregate containing data
that tends to be accessed together

20
Capturing Relationships in
MongoDB
• Two options:
 1. store references to other documents
using their _id values

 2. embed documents within other


documents

21
Example relationships
Consider the following documents examples:
{ {
"_id":ObjectId("52ffc33cd85242f436000001"), "_id":ObjectId("52ffc4a5d85242602e000000"),
"name": "Tom Hanks", "building": "22 A, Indiana Apt",
"contact": "987654321", "pincode": 123456,
"dob": "01-01-1991" "city": "Los Angeles",
} "state": "California"
}

Here is an example of embedded relationship:


{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin", And here an example of reference based
"address": [
{ {
"building": "22 A, Indiana Apt",
"_id":ObjectId("52ffc33cd85242f436000001"),
"pincode": 123456,
"city": "Los Angeles", "contact": "987654321",
"state": "California" "dob": "01-01-1991",
}, "name": "Tom Benzamin",
{ "address_ids": [
"building": "170 A, Acropolis Apt", ObjectId("52ffc4a5d85242602e000000"),
"pincode": 456789, ObjectId("52ffc4a5d85242602e000001")
"city": "Chicago",
]
"state": "Illinois"
} }
]
}
22
Other Structure Issues
• NoSQL: a) Tables are unnatural, b) “joins” are
evil, c) need to be able to “grep” my data

• DB: a) Tables are a natural/neutral structure,


b) data independence lets you precompute
joins under the covers, c) this is a price of all
the DBMS goodness you get

This is an Old Debate – Object-oriented


databases, XML DBs, Hierarchical, …

23

You might also like