Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views32 pages

BDA Module 5 - Part1 (No SQL) 2023

The document provides an overview of NoSQL databases, detailing their characteristics, advantages, and types, including key-value, column-oriented, document, and graph databases. It discusses the benefits of NoSQL, such as scalability and flexibility, while also addressing challenges like lack of ACID compliance and expertise. Additionally, it covers distribution models, the CAP theorem, and the BASE consistency model as alternatives to traditional ACID transactions.

Uploaded by

shubyadav1010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views32 pages

BDA Module 5 - Part1 (No SQL) 2023

The document provides an overview of NoSQL databases, detailing their characteristics, advantages, and types, including key-value, column-oriented, document, and graph databases. It discusses the benefits of NoSQL, such as scalability and flexibility, while also addressing challenges like lack of ACID compliance and expertise. Additionally, it covers distribution models, the CAP theorem, and the BASE consistency model as alternatives to traditional ACID transactions.

Uploaded by

shubyadav1010
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Big Data Analytics

Module V-Part 1

Introduction to NoSQL Data Management

By

Dr. Jagadamba G
Dept. of ISE, SIT, Tumakuru
Learning Objectives and Learning Outcomes

Learning Objectives Learning Outcomes

The big data technology


landscape
a) To understand the
1. What is NoSQL databases? significance of NoSQL
databases.
2. Why NoSQL?
b) To understand the need for
3. Key advantages of NoSQL. NewSQL.

Big Data and Analytics by Seema Acharya and Subhashini


Chellappan
Introduction to NoSQL (Not Only SQL)

• The NoSQL provided a platform for schema free


database that can handle large amount of data.
• These databases are scalable, enable availability of
user, support replication and are distributed and
possibly open source.
• Before we develop applications that can interact
with NoSQL databases, we should understand
need for maintaining separation between data
management and data storage in these databases.
• It focuses on high performance scalable data
storage and provides low-level access to the data
management layer.
• This allows data management tasks to be created
easily in any programming language.
Why NoSQL?
Non-relational data storage systems

No fixed table schema

No Joins
NoSQL

No multi-document transactions

Relaxes one or more ACID properties


Benefits of NoSQL Databases Challenges against NoSQL
• Scalable • ACID transaction
• Simple data model • Cannot use SQL
• Streaming/Volume • Ecosystem/tools/adds-on
• Reliability • Cannot perform searches
• Schema-lies • Data loss
• Rapid development • No referential integrity
• Flexible • Lack of availability of expertise
• Cheaper than RDBMS
• Creates a caching layer
• Wide data type variety
• Uses large binary objects for storing
large data
• Bulk upload
• Graphs
• Lower administration
• Distributed storage
• Real-time analysis
Characteristics of NoSQL
•Rows in tables—NoSQL systems store and retrieve data from many formats: key-value
stores, graph databases, column-family (Bigtable) stores, document stores, and even
rows in tables.
•Free of joins—NoSQL systems allow you to extract your data using simple interfaces
without joins.
•Schema-free—NoSQL systems allow you to drag-and-drop your data into a folder and
then query it without creating an entity-relational model.
•Works on many processors—NoSQL systems allow you to store your database on
multiple processors and maintain high-speed performance.
•Uses shared-nothing commodity computers—Most (but not all) NoSQL systems
leverage low-cost commodity processors that have separate RAM and disk.
•Supports linear scalability—When you add more processors, you get a consistent
increase in performance.
•Innovative—NoSQL offers options to a single way of storing, retrieving, and
manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive
•attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL
community, NoSQL means “Not only SQL.”
History of NoSQL

• Invented by Carlo Strozzi in 1998


• It started with the mechanism for data retrieval and storage
• Eric Evans reintroduced the term NoSQL in 2009.
• NoSQL databse are mangoDB, Cassandra, redis, Hbase, Splunk, Neo4j, CouchDB, etc
Types of NoSQL
Types of NoSQL

Key value data Column-oriented Document data Graph data


model Data model model model

• Riak • Cassandra • MongoDB • InfiniteGraph


• Redis • HBase • CouchDB • Neo4
• Membase • HyperTable • RavenDB • Allegro Graph
1. Key value data Model
• A key-value database (also known as a key-value store and key-value store database) is a
type of NoSQL database that uses a simple key/value method to store data.

• The key-value part refers to the fact that the database stores data as a collection of
key/value pairs. This is a simple method of storing data, and it is known to scale well.

• The key-value pair is a well established concept in many programming languages.


Programming languages typically refer to a key-value as an associative array or data
structure. A key-value is also commonly referred to as a dictionary or hash.
• Example: Phone directory
Key Value
Bob (123) 456-7890
Jane (234) 567-8901
Tara (345) 678-9012
Tiara (456) 789-0123
The Key
• The key in a key-value pair must (or at least, should) be unique. This is the
unique identifier that allows you to access the value associated with that
key.
• In theory, the key could be anything. But this may depend on the DBMS.
One DBMS may impose limitations while another may impose none.
• However, for performance reasons, you should avoid having a key that’s
too long. But too short can cause readability issues too. In any case, the key
should follow an agreed convention in order to keep things consistent.
The Value

• The value in a key-value store can be anything, such as text (long


or short), a number, markup code such as HTML, programming code
such as PHP, an image, etc.
• The value could also be a list, or even another key-value pair
encapsulated in an object.
• Some key-store DBMSs allow you to specify a data type for the
value. For example, you could specify that the value should be an
integer. Other DBMSs don’t provide this functionality and therefore,
the value could be of any type.
Examples of Key-Value Database
Management Systems
• Redis
• Oracle NoSQL Database
• Voldemorte
• Aerospike
• Oracle Berkeley DB
2. Column-oriented Data model
• In this, data is stored in cells grouped in columns of data rather than as rows of data.
• Columns are logically grouped into column families. Column families can contain a
virtually unlimited number of columns that can be created at runtime or while defining
the schema.
• Read and write is done using columns rather than rows.
• Column families are groups of similar data that is usually accessed together. As an
example, we often access customers’ names and profile information at the same time,
but not the information on their orders.
• The main advantages of storing data in columns over relational DBMS are fast
search/access and data aggregation.
• Each column family can be compared to a container of rows in an RDBMS table, where
the key identifies the row and the row consists of multiple columns. The difference is
that various rows do not have to have the same columns, and columns can be added to
any row at any time without having to add them to other rows.
Examples of column oriented data model

• Content management systems


• Blogging platforms
• Systems that maintain counters
• Services that have expiring usage
• Systems that require heavy write requests (like log
aggregators)
3. Document Data Model
• There are many types of document
databases, such as XML, JSON, BSON, etc.
• These are self describing, hierarchical tree
data structures that can contain maps,
collections and scalar value.

• Document databases store documents in the value part of the key/value store
• For easier transactions from relational database, document database provides
indexing and searching etc.
• It provides good performance and scalability, but doesn't provides ACID and data
integrity.
• Document database not a replacement to relational database, but an alternate
way
Examples of Document Data model

• MangoDB
• CouchDB
• Terrastore
• orientDB
• RavenDB
• Lotus Notes

Note: Couchbase now offers ACID Transactions.


4. Graph base NoSQL database
• It is designed to handle very large sets of data that is capable of
integrating heterogeneous data from many sources and making links
between datasets.
• It focuses on the relationships between entities and is able to infer new
knowledge out of existing information.
• It is built upon the Entity – Attribute – Value model.
• Entities are also known as nodes, which have properties.
• It is a very flexible way to describe how data relates to other data.
• Nodes store data about each entity in the database, relationships
describe a relationship between nodes, and a property is simply the node
on the opposite end of the relationship.
• Whereas a traditional database stores a description of each possible
relationship in foreign key fields or junction tables.
• But, graph databases allow virtual relationship on any definition.
Examples of Graph base NoSQL database

• Neo4J
• InfoGrid
• Infinite Graph.

Note: Fortune 500 financial services company uses Neo4j to more quickly identify potential fraud,
stopping millions of fraudulent transactions.
With the advent of the NoSQL movement, businesses of all sizes have a
variety of modern options from which to build solutions relevant to their use
cases.

• Calculating average income? Ask a relational database.

• Building a shopping cart? Use a key-value Store.

• Storing structured product information? Store as a document.

• Describing how a user got from point A to point B? Follow a graph.


Advantages of NoSQL
Advantages of NoSQL

Cheap, Easy to implement

Easy to distribute

Can easily scale up & down


Advantages of NoSQL
Relaxes the data consistency
requirement

Doesn’t require a pre-defined


schema

Data can be replicated to


multiple nodes and can be
partitioned
NoSQL Vendors
NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop


NoSQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, eBay

Google BigTable Adobe Photoshop


Materialized View
• Materialized view is slightly different from normal views. And will be used in some environments
where the source data is in a format that is not suitable for querying.
• These views are disk based and updated periodically as per the requirements of the query.
• It does have a storage cost associated with it.
• It does have updations cost associated with it.
• There is no SQL standard for defining a materialized view, and the functionality is provided by some
databases systems as an extension.
• Materialized views are efficient when the view is accessed frequently as it saves the computation
time by storing the results before hand., i.e., when response time should be very fast.
Distribution models
There are two styles of distributing data:
• Sharding provides horizontal scalability, which allows different sites to have
different types of data. This scalability helps in reducing the work load of servers
• Replication is just a process of coping the same data across different sites while
sharding is the process of distributing different datasets on different sites.
• In addition sharding improves both read and write performance, while replication
improves red performance but not write performance.
CAP theorem
The CAP theorem applies to distributed systems—namely, that a distributed
system can deliver only two of three desired characteristics: Consistency,
Availability, and Partition tolerance (the ‘C,’ ‘A’ and ‘P’ in CAP).

• Consistency: Consistency means that all clients see the same data at the same time, no matter
which node they connect to. For this to happen, whenever data is written to one node, it must
be instantly forwarded or replicated to all the other nodes in the system before the write is
deemed ‘successful.’
• Availability: Availability means that that any client making a request for data gets a response,
even if one or more nodes are down. Another way to state this—all working nodes in the
distributed system return a valid response for any request, without exception.
• Partition tolerance: A partition is a communications break within a distributed system—a lost
or temporarily delayed connection between two nodes. Partition tolerance means that the
cluster must continue to work despite any number of communication breakdowns between
nodes in the system.
ACID Property
• ACID transactions are a very important feature that most relational
databases have had for decades. They enable you to combine a series of
different database operations into one transaction that provides the
following four guarantees:
• Atomicity - that the operations will all either succeed or fail as a single
unit;
• Consistency - that they won’t violate certain constraints you defined for
the data as a whole;
• Isolation - that each operation is hidden from view until the whole
transaction is complete;
• Durability - that all changes to the data are safely persisted.
BASE an alternate to ACID

• When it comes to NoSQL databases, data consistency models can


sometimes be strikingly different than those used by relational databases
(as well as quite different from other NoSQL stores).

• The two most common consistency models are known by the acronyms
ACID and BASE. While they’re often pitted against each other in a battle
for ultimate victory (please someone make a video of that), both
consistency models come with advantages – and disadvantages – and
neither is always a perfect fit.
BASE an alternate to ACID

• In the NoSQL database world, ACID transactions are less fashionable as


some databases have loosened the requirements for immediate
consistency, data freshness and accuracy in order to gain other benefits,
like scale and resilience.

• Here’s how the BASE acronym breaks down:

• Basic Availability: The database appears to work most of the time.


• Soft-state: Stores don’t have to be write-consistent, nor do different
replicas have to be mutually consistent all the time.
• Eventual consistency: Stores exhibit consistency at some later point (e.g.,
lazily at read time).
Sharding

• Sharding is a partitioning pattern for the NoSQL age.


• Sharding is a method of splitting and storing a single logical dataset in
multiple databases.
• Sharding is also referred as horizontal partitioning. The distinction of
horizontal vs vertical comes from the traditional tabular view of a
database.
• A database can be split vertically — storing different tables & columns in
a separate database, or horizontally — storing rows of a same table in
multiple database nodes.

You might also like