0% found this document useful (0 votes)

14 views32 pages

BDA Module 5 - Part1 (No SQL) 2023

The document provides an overview of NoSQL databases, detailing their characteristics, advantages, and types, including key-value, column-oriented, document, and graph databases. It discusses the benefits of NoSQL, such as scalability and flexibility, while also addressing challenges like lack of ACID compliance and expertise. Additionally, it covers distribution models, the CAP theorem, and the BASE consistency model as alternatives to traditional ACID transactions.

Uploaded by

shubyadav1010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views32 pages

BDA Module 5 - Part1 (No SQL) 2023

Uploaded by

shubyadav1010

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Big Data Analytics

Module V-Part 1

Introduction to NoSQL Data Management

Dr. Jagadamba G
Dept. of ISE, SIT, Tumakuru
Learning Objectives and Learning Outcomes

Learning Objectives Learning Outcomes

The big data technology

landscape
a) To understand the
1. What is NoSQL databases? significance of NoSQL
databases.
2. Why NoSQL?
b) To understand the need for
3. Key advantages of NoSQL. NewSQL.

Big Data and Analytics by Seema Acharya and Subhashini

Chellappan
Introduction to NoSQL (Not Only SQL)

• The NoSQL provided a platform for schema free

database that can handle large amount of data.
• These databases are scalable, enable availability of
user, support replication and are distributed and
possibly open source.
• Before we develop applications that can interact
with NoSQL databases, we should understand
need for maintaining separation between data
management and data storage in these databases.
• It focuses on high performance scalable data
storage and provides low-level access to the data
management layer.
• This allows data management tasks to be created
easily in any programming language.
Why NoSQL?
Non-relational data storage systems

No fixed table schema

No Joins
NoSQL

No multi-document transactions

Relaxes one or more ACID properties

Benefits of NoSQL Databases Challenges against NoSQL
• Scalable • ACID transaction
• Simple data model • Cannot use SQL
• Streaming/Volume • Ecosystem/tools/adds-on
• Reliability • Cannot perform searches
• Schema-lies • Data loss
• Rapid development • No referential integrity
• Flexible • Lack of availability of expertise
• Cheaper than RDBMS
• Creates a caching layer
• Wide data type variety
• Uses large binary objects for storing
large data
• Bulk upload
• Graphs
• Lower administration
• Distributed storage
• Real-time analysis
Characteristics of NoSQL
•Rows in tables—NoSQL systems store and retrieve data from many formats: key-value
stores, graph databases, column-family (Bigtable) stores, document stores, and even
rows in tables.
•Free of joins—NoSQL systems allow you to extract your data using simple interfaces
without joins.
•Schema-free—NoSQL systems allow you to drag-and-drop your data into a folder and
then query it without creating an entity-relational model.
•Works on many processors—NoSQL systems allow you to store your database on
multiple processors and maintain high-speed performance.
•Uses shared-nothing commodity computers—Most (but not all) NoSQL systems
leverage low-cost commodity processors that have separate RAM and disk.
•Supports linear scalability—When you add more processors, you get a consistent
increase in performance.
•Innovative—NoSQL offers options to a single way of storing, retrieving, and
manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive
•attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL
community, NoSQL means “Not only SQL.”
History of NoSQL

• Invented by Carlo Strozzi in 1998

• It started with the mechanism for data retrieval and storage
• Eric Evans reintroduced the term NoSQL in 2009.
• NoSQL databse are mangoDB, Cassandra, redis, Hbase, Splunk, Neo4j, CouchDB, etc
Types of NoSQL
Types of NoSQL

Key value data Column-oriented Document data Graph data

model Data model model model

• Riak • Cassandra • MongoDB • InfiniteGraph

• Redis • HBase • CouchDB • Neo4
• Membase • HyperTable • RavenDB • Allegro Graph
1. Key value data Model
• A key-value database (also known as a key-value store and key-value store database) is a
type of NoSQL database that uses a simple key/value method to store data.

• The key-value part refers to the fact that the database stores data as a collection of
key/value pairs. This is a simple method of storing data, and it is known to scale well.

• The key-value pair is a well established concept in many programming languages.

Programming languages typically refer to a key-value as an associative array or data
structure. A key-value is also commonly referred to as a dictionary or hash.
• Example: Phone directory
Key Value
Bob (123) 456-7890
Jane (234) 567-8901
Tara (345) 678-9012
Tiara (456) 789-0123
The Key
• The key in a key-value pair must (or at least, should) be unique. This is the
unique identifier that allows you to access the value associated with that
key.
• In theory, the key could be anything. But this may depend on the DBMS.
One DBMS may impose limitations while another may impose none.
• However, for performance reasons, you should avoid having a key that’s
too long. But too short can cause readability issues too. In any case, the key
should follow an agreed convention in order to keep things consistent.
The Value

• The value in a key-value store can be anything, such as text (long

or short), a number, markup code such as HTML, programming code
such as PHP, an image, etc.
• The value could also be a list, or even another key-value pair
encapsulated in an object.
• Some key-store DBMSs allow you to specify a data type for the
value. For example, you could specify that the value should be an
integer. Other DBMSs don’t provide this functionality and therefore,
the value could be of any type.
Examples of Key-Value Database
Management Systems
• Redis
• Oracle NoSQL Database
• Voldemorte
• Aerospike
• Oracle Berkeley DB
2. Column-oriented Data model
• In this, data is stored in cells grouped in columns of data rather than as rows of data.
• Columns are logically grouped into column families. Column families can contain a
virtually unlimited number of columns that can be created at runtime or while defining
the schema.
• Read and write is done using columns rather than rows.
• Column families are groups of similar data that is usually accessed together. As an
example, we often access customers’ names and profile information at the same time,
but not the information on their orders.
• The main advantages of storing data in columns over relational DBMS are fast
search/access and data aggregation.
• Each column family can be compared to a container of rows in an RDBMS table, where
the key identifies the row and the row consists of multiple columns. The difference is
that various rows do not have to have the same columns, and columns can be added to
any row at any time without having to add them to other rows.
Examples of column oriented data model

• Content management systems

• Blogging platforms
• Systems that maintain counters
• Services that have expiring usage
• Systems that require heavy write requests (like log
aggregators)
3. Document Data Model
• There are many types of document
databases, such as XML, JSON, BSON, etc.
• These are self describing, hierarchical tree
data structures that can contain maps,
collections and scalar value.

• Document databases store documents in the value part of the key/value store
• For easier transactions from relational database, document database provides
indexing and searching etc.
• It provides good performance and scalability, but doesn't provides ACID and data
integrity.
• Document database not a replacement to relational database, but an alternate
way
Examples of Document Data model

• MangoDB
• CouchDB
• Terrastore
• orientDB
• RavenDB
• Lotus Notes

Note: Couchbase now offers ACID Transactions.

4. Graph base NoSQL database
• It is designed to handle very large sets of data that is capable of
integrating heterogeneous data from many sources and making links
between datasets.
• It focuses on the relationships between entities and is able to infer new
knowledge out of existing information.
• It is built upon the Entity – Attribute – Value model.
• Entities are also known as nodes, which have properties.
• It is a very flexible way to describe how data relates to other data.
• Nodes store data about each entity in the database, relationships
describe a relationship between nodes, and a property is simply the node
on the opposite end of the relationship.
• Whereas a traditional database stores a description of each possible
relationship in foreign key fields or junction tables.
• But, graph databases allow virtual relationship on any definition.
Examples of Graph base NoSQL database

• Neo4J
• InfoGrid
• Infinite Graph.

Note: Fortune 500 financial services company uses Neo4j to more quickly identify potential fraud,
stopping millions of fraudulent transactions.
With the advent of the NoSQL movement, businesses of all sizes have a
variety of modern options from which to build solutions relevant to their use
cases.

• Calculating average income? Ask a relational database.

• Building a shopping cart? Use a key-value Store.

• Storing structured product information? Store as a document.

• Describing how a user got from point A to point B? Follow a graph.

Advantages of NoSQL
Advantages of NoSQL

Cheap, Easy to implement

Easy to distribute

Can easily scale up & down

Advantages of NoSQL
Relaxes the data consistency
requirement

Doesn’t require a pre-defined

schema

Data can be replicated to

multiple nodes and can be
partitioned
NoSQL Vendors
NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

Materialized View
• Materialized view is slightly different from normal views. And will be used in some environments
where the source data is in a format that is not suitable for querying.
• These views are disk based and updated periodically as per the requirements of the query.
• It does have a storage cost associated with it.
• It does have updations cost associated with it.
• There is no SQL standard for defining a materialized view, and the functionality is provided by some
databases systems as an extension.
• Materialized views are efficient when the view is accessed frequently as it saves the computation
time by storing the results before hand., i.e., when response time should be very fast.
Distribution models
There are two styles of distributing data:
• Sharding provides horizontal scalability, which allows different sites to have
different types of data. This scalability helps in reducing the work load of servers
• Replication is just a process of coping the same data across different sites while
sharding is the process of distributing different datasets on different sites.
• In addition sharding improves both read and write performance, while replication
improves red performance but not write performance.
CAP theorem
The CAP theorem applies to distributed systems—namely, that a distributed
system can deliver only two of three desired characteristics: Consistency,
Availability, and Partition tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP).

• Consistency: Consistency means that all clients see the same data at the same time, no matter
which node they connect to. For this to happen, whenever data is written to one node, it must
be instantly forwarded or replicated to all the other nodes in the system before the write is
deemed ‘successful.’
• Availability: Availability means that that any client making a request for data gets a response,
even if one or more nodes are down. Another way to state this—all working nodes in the
distributed system return a valid response for any request, without exception.
• Partition tolerance: A partition is a communications break within a distributed system—a lost
or temporarily delayed connection between two nodes. Partition tolerance means that the
cluster must continue to work despite any number of communication breakdowns between
nodes in the system.
ACID Property
• ACID transactions are a very important feature that most relational
databases have had for decades. They enable you to combine a series of
different database operations into one transaction that provides the
following four guarantees:
• Atomicity - that the operations will all either succeed or fail as a single
unit;
• Consistency - that they won’t violate certain constraints you defined for
the data as a whole;
• Isolation - that each operation is hidden from view until the whole
transaction is complete;
• Durability - that all changes to the data are safely persisted.
BASE an alternate to ACID

• When it comes to NoSQL databases, data consistency models can

sometimes be strikingly different than those used by relational databases
(as well as quite different from other NoSQL stores).

• The two most common consistency models are known by the acronyms
ACID and BASE. While they’re often pitted against each other in a battle
for ultimate victory (please someone make a video of that), both
consistency models come with advantages – and disadvantages – and
neither is always a perfect fit.
BASE an alternate to ACID

• In the NoSQL database world, ACID transactions are less fashionable as

some databases have loosened the requirements for immediate
consistency, data freshness and accuracy in order to gain other benefits,
like scale and resilience.

• Here’s how the BASE acronym breaks down:

• Basic Availability: The database appears to work most of the time.

• Soft-state: Stores don’t have to be write-consistent, nor do different
replicas have to be mutually consistent all the time.
• Eventual consistency: Stores exhibit consistency at some later point (e.g.,
lazily at read time).
Sharding

• Sharding is a partitioning pattern for the NoSQL age.

• Sharding is a method of splitting and storing a single logical dataset in
multiple databases.
• Sharding is also referred as horizontal partitioning. The distinction of
horizontal vs vertical comes from the traditional tabular view of a
database.
• A database can be split vertically — storing different tables & columns in
a separate database, or horizontally — storing rows of a same table in
multiple database nodes.

CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
38 pages
Unit III (FSWD)
No ratings yet
Unit III (FSWD)
27 pages
Unit 2
No ratings yet
Unit 2
25 pages
NoSQL Databases Notes
No ratings yet
NoSQL Databases Notes
5 pages
NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
33 pages
Nosql
No ratings yet
Nosql
6 pages
DSA Notes Unit-03
No ratings yet
DSA Notes Unit-03
144 pages
Chapter1 NoSQL Databases
No ratings yet
Chapter1 NoSQL Databases
7 pages
Unit - 2
No ratings yet
Unit - 2
70 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
NoSQL Lec
No ratings yet
NoSQL Lec
45 pages
Unit 3 Nosql Databases Adt
No ratings yet
Unit 3 Nosql Databases Adt
64 pages
Unit V Big Data Frameworks
No ratings yet
Unit V Big Data Frameworks
42 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
Unit 6
No ratings yet
Unit 6
143 pages
Module 3 Bigdata Analytics
No ratings yet
Module 3 Bigdata Analytics
19 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
NoSQL vs RDBMS: A Modern Shift
100% (1)
NoSQL vs RDBMS: A Modern Shift
142 pages
Unit II - BIG DATA ANALYTICS
No ratings yet
Unit II - BIG DATA ANALYTICS
11 pages
NoSQL Databases
No ratings yet
NoSQL Databases
10 pages
Unit 1 (Iot)
No ratings yet
Unit 1 (Iot)
11 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
No SQL
No ratings yet
No SQL
12 pages
Bda CHP 3
No ratings yet
Bda CHP 3
75 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
Unit 2
No ratings yet
Unit 2
26 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Unit 2
No ratings yet
Unit 2
65 pages
NoSQL Database
No ratings yet
NoSQL Database
10 pages
NoSQL & MongoDB Essentials
No ratings yet
NoSQL & MongoDB Essentials
52 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
No SQL
No ratings yet
No SQL
12 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Unit 5 - 230601 - 174540-1
No ratings yet
Unit 5 - 230601 - 174540-1
14 pages
NOSQL
No ratings yet
NOSQL
25 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
NOSQL Concept 2
No ratings yet
NOSQL Concept 2
4 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
Unit 3
No ratings yet
Unit 3
10 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NoSQL Database
No ratings yet
NoSQL Database
45 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
1Z0-061 Dumps - Oracle Database Administration Exam
100% (1)
1Z0-061 Dumps - Oracle Database Administration Exam
12 pages
Unit 2
No ratings yet
Unit 2
23 pages
NoSQL for Developers and IT Pros
No ratings yet
NoSQL for Developers and IT Pros
3 pages
SQL Injection Is A Code Injection Technique That Exploits A Security Vulnerability Occurring in The Database Layer of An Application
100% (2)
SQL Injection Is A Code Injection Technique That Exploits A Security Vulnerability Occurring in The Database Layer of An Application
6 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Unit 1
No ratings yet
Unit 1
6 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
10 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
DBMS Lab Assignment 5
0% (1)
DBMS Lab Assignment 5
11 pages
Aip Book
No ratings yet
Aip Book
372 pages
Dental Clinic
No ratings yet
Dental Clinic
51 pages
Technical Brief Stats Concepts 19c
No ratings yet
Technical Brief Stats Concepts 19c
27 pages
ADO.NET
No ratings yet
ADO.NET
6 pages
SQL Joins
No ratings yet
SQL Joins
12 pages
1Z0 062 Teststudyguide
No ratings yet
1Z0 062 Teststudyguide
22 pages
Postgres 4.1
No ratings yet
Postgres 4.1
765 pages
RA ARCHI DAVAO Jan2017 PDF
No ratings yet
RA ARCHI DAVAO Jan2017 PDF
5 pages
SPI 2110 Introduction To Database Management Systems Year I II Semester II I
No ratings yet
SPI 2110 Introduction To Database Management Systems Year I II Semester II I
3 pages
70 Important Question CS
No ratings yet
70 Important Question CS
29 pages
Python DataFrame and Plotting Tasks
No ratings yet
Python DataFrame and Plotting Tasks
14 pages
2004 Presentation 549
No ratings yet
2004 Presentation 549
42 pages
Lots of Runnable SPID - What Next
No ratings yet
Lots of Runnable SPID - What Next
2 pages
SQL
No ratings yet
SQL
5 pages
Architecture Project e ?
No ratings yet
Architecture Project e ?
27 pages
Cpe El1 Table Restriction
No ratings yet
Cpe El1 Table Restriction
4 pages
Library DB
No ratings yet
Library DB
10 pages
Data Source Migration
No ratings yet
Data Source Migration
31 pages
Oracle DECODE vs CASE Guide
No ratings yet
Oracle DECODE vs CASE Guide
5 pages
The Intelligent Database Interface: Integrating AI and Database Systems
No ratings yet
The Intelligent Database Interface: Integrating AI and Database Systems
9 pages
DBMS Unit-2
No ratings yet
DBMS Unit-2
13 pages
Advanced Database Lab Project Final
No ratings yet
Advanced Database Lab Project Final
5 pages
.NET Data Access & Web Dev Guide
No ratings yet
.NET Data Access & Web Dev Guide
8 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
DB Assignment Aug Dec 09
No ratings yet
DB Assignment Aug Dec 09
4 pages
Dbms Bca m5
No ratings yet
Dbms Bca m5
3 pages

BDA Module 5 - Part1 (No SQL) 2023

Uploaded by

BDA Module 5 - Part1 (No SQL) 2023

Uploaded by

Big Data Analytics

Introduction to NoSQL Data Management

Learning Objectives Learning Outcomes

The big data technology

Big Data and Analytics by Seema Acharya and Subhashini

• The NoSQL provided a platform for schema free

No fixed table schema

Relaxes one or more ACID properties

• Invented by Carlo Strozzi in 1998

Key value data Column-oriented Document data Graph data

• Riak • Cassandra • MongoDB • InfiniteGraph

• The key-value pair is a well established concept in many programming languages.

• The value in a key-value store can be anything, such as text (long

• Content management systems

Note: Couchbase now offers ACID Transactions.

• Calculating average income? Ask a relational database.

• Building a shopping cart? Use a key-value Store.

• Storing structured product information? Store as a document.

• Describing how a user got from point A to point B? Follow a graph.

Cheap, Easy to implement

Can easily scale up & down

Doesn’t require a pre-defined

Data can be replicated to

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop

• When it comes to NoSQL databases, data consistency models can

• In the NoSQL database world, ACID transactions are less fashionable as

• Here’s how the BASE acronym breaks down:

• Basic Availability: The database appears to work most of the time.

• Sharding is a partitioning pattern for the NoSQL age.

You might also like