Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views80 pages

06 NoSQL

The document provides an overview of NoSQL databases, highlighting their characteristics, models, and use cases compared to traditional relational databases. It discusses various NoSQL data models including key-value, document, column family, and graph databases, emphasizing their advantages in handling big data and scalability. Additionally, it covers MongoDB as a prominent document database, detailing its features, data types, and querying methods.

Uploaded by

hieutm0507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views80 pages

06 NoSQL

The document provides an overview of NoSQL databases, highlighting their characteristics, models, and use cases compared to traditional relational databases. It discusses various NoSQL data models including key-value, document, column family, and graph databases, emphasizing their advantages in handling big data and scalability. Additionally, it covers MongoDB as a prominent document database, detailing its features, data types, and querying methods.

Uploaded by

hieutm0507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

NOSQL

Lê Hồng Hải
UET-VNUH
Overview

1 Introduction

2 NoSQL models

3 When to use

2
Relational databases

 Very good background


 Standard Query Language (SQL)
 ACID
 Strong consistency, concurrency, recovery
 Lots of tools to use i.e: Reporting services,
entity frameworks, ...

3
SQL Databases

4
Relational databases

 Relational databases were


not built for distributed
applications.
 Joins are expensive
 Hard to scale horizontally
 Expensive (product cost,
hardware, Maintenance)

5
Relational databases

 In the relational database model, it is


needed to join a large number of data
tables

6
Dealing with Big Data and Scalability

 Issues with scaling up when the dataset is


just too big
 RDBMS were not designed to be
distributed
 Traditional DBMSs are best designed to
run well on a “single” machine
◼ Larger volumes of data/operations requires to
upgrade the server with faster CPUs or more
memory known as ‘Scaling up’ or ‘Vertical
scaling’

7
No SQL

 NoSQL stands for:


◼ No Relational
◼ No RDBMS
◼ Not Only SQL
 NoSQL is an umbrella term for all
databases and data stores that don’t follow
the RDBMS principles

8
NoSQL Definition

From www.nosql-database.org:
Next Generation Databases mostly addressing some of the
points: being non-relational, distributed, open-source and
horizontally scalable. The original intention has been
modern web-scale databases. The movement began early
2009 and is growing rapidly.
Often more characteristics apply as: schema-free, easy
replication support, simple API, eventually consistent /
BASE (not ACID), a huge data amount, and more.

9
NoSQL History

10
Characteristics of NoSQL databases

 Easy and frequent changes to DB


◼ Fast development
◼ Large data volumes (eg. Google)
◼ Schema less
 NoSQL solutions are designed to run on
clusters or multi-node database solutions
 When not use:
◼ Financial Data
◼ Data requiring strict ACID compliance
◼ Business Critical Data

11
NoSQLis getting more & more popular

12
NoSQLData Models

 NoSQL databases are classified in four major data


models:
◼ Key-value
◼ Document
◼ Column family
◼ Graph

13
Key-value data

 Simplest NOSQL databases


 The main idea is the use of a hash table
 Access data (values) by strings called keys
 Data has no required format data may have any
format

14
Key/Value stores

 Store data in a schema-less way


 Store data as maps: HashMaps or associative
arrays
 Provide a very efficient average running time
algorithm for accessing data

15
Use cases of key-value databases

 Session management
◼ A session-oriented application, such as a web
application, starts a session when a user logs in to
an application and is active until the user logs out
or the session times out.
 Shopping cart
◼ An e-commerce website may receive billions of
orders per second during the holiday shopping
season
 Caching
◼ You can use a key-value database for storing data
temporarily for faster retrieval
16
NoSQL Data Models

1 Key-Value
2 Column Wide
3 Graph
4 Document

17
Column wide

 Data are stored in a column-oriented way


◼ Data isn’t stored as a single table but is stored
by column families
◼ Unit of data is a set of key/value pairs
 Identified by “row-key”
 Ordered and sorted based on row-key

18
Column Wide

 Can write data with a large number of (dynamic)


columns to a data table
 The cartItems part along with the username key
and cardId will be written serially to the data
stream
 Therefore, it helps to quickly retrieve data during
customer purchases

19
Cassandra Column wide

 Cassandra stands out with the advantage


of being able to write and read at any
computer node in the cluster, especially
writing speed

20
Data on Cluster

 Determine the location of the data access


node based on the partition key

21
Cassandra Column wide

22
Cassandra Column wide

 Some statistics about Facebook Search (using


Cassandra)
 MySQL > 50 GB Data
◼ Writes Average : ~300 ms
◼ Reads Average : ~350 ms
 Rewritten with Cassandra > 50 GB Data
◼ Writes Average : 0.12 ms
◼ Reads Average : 15 ms

23
NoSQL Data Models

1 Key-Value
2 Column Wide
3 Graph
4 Document

24
Graph Databases

• Nodes: These are the instances of data that represent


objects which is to be tracked
• Edges: As we already know edges represent
relationships between nodes
• Properties: It represents information associated with
nodes.

25
25
Graph Databases

 While existing relational databases can


store these relationships, they navigate
them with expensive JOIN operations or
cross-lookups, often tied to a rigid schema
 It turns out that "relational" databases
handle relationships poorly

26
Graph Databases

 In a graph database, there are no JOINs or


lookups. Relationships are stored natively
alongside the data elements (the nodes)
 Everything about the system is optimized
for traversing through data quickly

27
Graph databases

 Graph databases address big challenges


many of us tackle daily. Modern data
problems often involve many-to-many
relationships with heterogeneous data that
set up needs to:
◼ Navigate deep hierarchies
◼ Find hidden connections between distant
object
◼ Discover inter-relationships between
objects

28
NoSQL Data Models

1 Key-Value
2 Column Wide
3 Graph
4 Document

29
Document Databases (Document Store)

 Documents
◼ Loosely structured sets of key/value pairs in
documents, e.g., XML, JSON, BSON
◼ Are addressed in the database via a unique key
◼ Documents are treated as a whole, avoiding
splitting a document into its constituent
name/value pairs
 Notable for:
◼ MongoDB (used in FourSquare, Github, and
more)
◼ CouchDB (used in Apple, BBC, Canonical,
Cern, and more)

30
Document Data

31
JSONdocument

 Field names allow you to understand what kind of


data is held within a document with just a glance
Documents in document databases are self-
describing

32
Document Features

• Flexible Schema: Overall schema is very much


flexible to support this statement one must know
that not all documents in a collection need to
have the same fields
• Distributed: Document data models are very
much dispersed which is the reason behind
horizontal scaling and distribution of data

33
CAP Theorem: Two out of Three

 CAP theorem – At most two properties on three


can be addressed

34
Performance

 Every database has its advantages and


disadvantages
 NoSQL is a set of concepts, ideas,
technologies, and software dealing with
◼ Big data
◼ Sparse un/semi-structured data
◼ High horizontal scalability
◼ Massive parallel processing
 Different applications, goals, targets, and
approaches need different NoSQL solutions

35
MONGODB
MONGODB

1 Introduction
2 Data types
3 Querying
4 Sharding

37
Terminology

Relational (SQL) MongoDB

Database Database Dynamic


Typing
Table Collection B-tree
(range-
Index Index based)

Row Document

Column Field Think JSON

Primitive types +
arrays,
documents

38
Document Database

 MongoDB documents are similar to JSON objects

39
MongoDB Document

◼ _id holds an ObjectId


◼ name holds an embedded document that
contains the fields first and last
◼ birth and death hold values of the Date type
◼ contribs holds an array of strings.
◼ views holds a value of the NumberLong type.
40
The _id Field

 In MongoDB, each document stored in a


collection requires a unique _id field that acts
as a primary key

 If an inserted document omits the _id field,


the MongoDB driver automatically generates
an ObjectId for the _id field

41
Data Types

 Null
◼ The null type can be used to represent both a
null value and a nonexistent field:
◼ {"x" : null}
 Boolean
◼ There is a boolean type, which can be used for
the values true and false:
◼ {"x" : true}
 Number
◼ The shell defaults to using 64-bit floating-point
numbers. Thus, these numbers

42
Data Types

 String
◼ Any string of UTF-8 characters can be
represented using the string type:
◼ {"x" : "foobar"}
 Date
◼ MongoDB stores dates as 64-bit integers
representing milliseconds since the Unix epoch
(January 1, 1970). The time zone is not stored:
◼ {"x" : new Date()}

43
Data Types

 Array
◼ Sets or lists of values can be represented as
arrays:
◼ {"x" : ["a", "b", "c"]}
 Embedded document
◼ Documents can contain entire documents
embedded as values in a parent document:
◼ {"x" : {"foo" : "bar"}}
 Object ID
◼ An object ID is a 12-byte ID for documents:
◼ {"x" : ObjectId()}

44
The advantages of using documents

 Embedded documents and arrays reduce the


need for expensive joins
 Support dynamic schema supports
 MongoDB stores data records
as documents (specifically BSON documents)
which are gathered together in collections
 The maximum BSON document size is 16 MB

45
Inserting Documents

 To insert a single document, use the


collection’s insertOne method:
db.movies.insertOne({"title" : "Stand by Me"})
 insertOne will add an "_id" key to the
document (if you do not supply one) and store
the document in MongoDB

46
InsertMany

 This method enables you to pass an array of


documents to the database
◼ db.movies.insertMany([{"title" :
"Ghostbusters"},{"title" : "E.T."},{"title" :
"Blade Runner"}]);

47
Removing Documents

 The CRUD API provides deleteOne and deleteMany


for this purpose. Both of these methods take a filter
document as their first parameter
◼ db.movies.deleteOne({"_id" : 4})
 To delete all the documents that match a filter, use
deleteMany:
◼ db.movies.deleteMany({"year" : 1984})

48
Updating Documents

 Once a document is stored in the database, it can be


changed using one of several update methods:
updateOne, updateMany, and replaceOne
◼ updateOne and updateMany each take a filter
document as their first parameter and a
modifier document as the second parameter
◼ replaceOne also takes a filter as the first
parameter, but as the second parameter
replaceOne expects a document with which it
will replace the document matching the filter

49
Update Operators

 "$set" sets the value of a field. If the field


does not yet exist, it will be created
 For example: If the user wanted to store
his favorite book in his profile, he could
add it using "$set":
◼ db.users.updateOne({"name" : "joe"},
{"$set" : {"favorite book" : "Green Eggs
and Ham"}})

50
Update Operators

 You can remove the key altogether with


"$unset“
◼ db.users.updateOne({"name" : "joe"}, {"$unset" :
{"favorite book" : 1}})

51
MONGODB

1 Introduction
2 Data types
3 Querying
4 Sharding

52
Introduction to find

 The find method is used to perform queries in


MongoDB. Querying returns a subset of documents
in a collection
◼ db.users.find({"age" : 27})
 Multiple conditions can be strung together by adding
more key/value pairs to the query document
◼ db.users.find({"username" : "joe", "age" : 27})

53
Query Criteria

 Queries can go beyond the exact matching


 "$lt", "$lte", "$gt", and "$gte" are all comparison
operators, corresponding to <,<=, >, and >=,
respectively.
 They can be combined to look for a range of values.
◼ db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

54
OR query

 There are two ways to do an OR query in


MongoDB. "$in" can be used to query for a
variety of values for a single key
 "$or" is more general; it can be used to query
for any of the given values across multiple keys
◼ db.inventory.find( { $or: [ { status: "A" }, {
qty: { $lt: 30 } } ] } )

55
$not

 "$not" is a meta conditional: it can be applied


on top of any other criteria

56
Querying Arrays

 Querying for elements of an array is designed to behave


the way querying for scalars does. For example, if the
array is a list of fruits, like this:
db.food.insertOne({"fruit" : ["apple", "banana",
"peach"]})
 The following query will successfully match the
document:
db.food.find({"fruit" : "banana"})

57
Querying on Embedded Documents

{
"name" : {
"first" : "Joe",
"last" : "Schmoe"
},
"age" : 45
}
db.people.find({"name.first" : "Joe", "name.last" :
"Schmoe"})

58
aggregate() Method

 The aggregate() method uses the


aggregation pipeline to process documents
into aggregated results

59
Example

https://www.geeksforgeeks.org/aggr
egation-in-mongodb/
60
Accumulators

• sum: It sums numeric values for the documents in


each group
• count: It counts total numbers of documents
• avg: It calculates the average of all given values from
all documents
• min: It gets the minimum value from all the
documents
• max: It gets the maximum value from all the
documents
• first: It gets the first document from the grouping
• last: It gets the last document from the grouping
61
MONGODB

1 Introduction
2 Data types
3 Querying
4 Sharding

62
Sharding

 Sharding refers to the process of splitting data


up across machines; the term partitioning is
also sometimes used to describe this concept
 It becomes possible to store more data and
handle more load

63
When to Shard

 Increase available RAM


 Increase available disk space
 Reduce load on a server
 Read or write data with greater throughput
than a single mongod can handle

64
MongoDB Sharding

 MongoDB supports autosharding, which tries


to both abstract the architecture away from
the application and simplify the administration
of such a system
 MongoDB automates balancing data across
shards and makes it easier to add and remove
capacity

65
MongoDB Sharding

66
Sharding on a Single-Machine Cluster

 We’ll start by setting up a quick cluster on a single


machine. First, start a mongo shell with the --nodb
and --norc options: $ mongo --nodb –norc
 Run the following in the mongo shell you just
launched

67
Connect to Mongos

 Next, you’ll connect to the mongos to play around with


the cluster. Your entire cluster
$ mongo –nodb
 Use this shell to connect to your cluster’s mongos.
 Again, your mongos should be running on port 20009:
 db = (new Mongo("localhost:20009")).getDB("accounts")

68
Sharding on a Single-Machine Cluster

 Start by inserting some data:


> for (var i=0; i<10000; i++) {
db.users.insert({"username" : "user"+i, "created_at" :
new Date()});}
> db.users.count()
10000
 As you can see, interacting with mongos works the
same way as interacting with standalone server does
 You can get an overall view of your cluster by running
sh.status(). It will give you a summary of your shards,
databases, and collections:

69
Enable Sharding

 To shard a particular collection, first enable sharding


on the collection’s database:
sh.enableSharding("accounts")
 When you shard a collection, you choose a shard key.
For example, if you chose to shard on "username",
MongoDB would break up the data into ranges of
usernames

70
ShardingCollection

 To even create a shard key, the field(s) must be


indexed. You have to create an index on the key you
want to shard by:
db.users.createIndex({"username" : 1})
 Now you can shard the collection by "username":
sh.shardCollection("accounts.users",
{"username" : 1})

The collection has been split up into 13 chunks, where


each chunk is a subset of your data.

71
Shardingkey

 Sharding is per-collection and range-based


 The highest-impact choice you make is the
shard key:
◼ Random keys: good for writes, bad for reads
◼ Right-aligned index: bad for writes
◼ Small # of discrete keys: very bad
Ideal: balance writes, make reads routable by
mongos. Optimal shard key selection is hard

72
Choosing a Shard Key

 The most common ways people choose to split


their data are via:
◼ Ascending
◼ Random
◼ Location-based keys

73
Ascending Shard Keys

 Ascending shard keys are generally something like a


"date" field or ObjectId—anything that steadily
increases over time

74
Randomly Distributed Shard Keys

 Randomly distributed keys could be


usernames, email addresses, UUIDs, MD5
hashes, or any other key that has no
identifiable pattern in your dataset
 As writes are randomly distributed, the shards
should grow at roughly the same rate, limiting
the number of migrates that need to occur.

75
Hashed Shard Key

 A hashed shard key can make any field randomly


distributed, so it is a good choice
 The trade-off is that you can never do a targeted
range query with a hashed shard key. If you will not
be doing range queries, though, hashed shard keys
are a good option.

76
Hashed Shard Key

 To create a hashed shard key, first create a hashed index:


> db.users.createIndex({"username" : "hashed"})
 Next, shard the collection with:
> sh.shardCollection("app.users", {"username" :
"hashed"})

77
Location-Based Shard Keys

A location-based key is a key where


documents with some similarity fall into a
range based on this field.
 This can be handy for both putting data close
to its users and keeping related data together
on disk

78
Sharding setupexample

Primary Data Center Secondary Data Center

Shard 1 Shard 1 Shard 1


Priority 1 Priority 1 Priority 0

Shard 2 Shard 2 Shard 2


Priority 1 Priority 1 Priority 0

Shard 3 Shard 3 Shard 3


Priority 1 Priority 1 Priority 0

Config 1 Config 2 Config 3

79
THANKS YOU

You might also like