NoSQL Databases
Overview
• Relational databases and the need for
NoSQL
• Common characteristics of NoSQL databases
• Types of NoSQL database
• Key-value
• Document
• Column-family
• Graph
• NoSQL and consistency
Resources for this section
• NoSQL Distilled (Sadalage & Fowler, 2013)
• Key points from this book are summarised here:
• http://martinfowler.com/articles/nosqlKeyPoints.html
Unlike traditional relational databases
(RDBMS) which use structured query
language (SQL) and are table-based, NoSQL
databases can store and retrieve
unstructured or semi-structured data.
Why is NoSQL needed?
• Relational databases have been a successful technology
for twenty years, providing persistence, concurrency
control, and an integration mechanism
• However, application developers have been frustrated with
the impedance mismatch between the relational model
and the in-memory data structures ( (objects, arrays, etc.
• Impedance mismatch: the effort required to split up an
object (e.g. an order) into separate tables when storing
it in a relational database (e.g. an order might have
elements in the product table, customer table, order line
table etc.) only to have to then put it back together
again (using joins)
Why is NoSQL needed?
• The vital factor for a change in data storage was the
need to support large volumes of data by running on
clusters. Relational databases are not designed to run
efficiently on clusters.
• NoSQL is an accidental neologism. There is no
prescriptive definition—all you can make is an
observation of common characteristics.
NoSQL common characteristics
• The common characteristics of NoSQL databases are:
• Not using the relational model
• Running well on clusters
• Open-source
• Built for the 21st century web
• Schemaless
• No explicit schema is defined; however, the implicit schema
of a NoSQL database must still be managed in some way
the term "running well on clusters" refers to the database's ability to efficiently operate
across multiple servers (nodes) that work together as a single system. This capability is
crucial for handling large-scale data and high-traffic loads by distributing the workload
across many machine
NoSQL databases are generally designed SQL Databases and Clusters
to run well on clusters due to several Traditional SQL (relational) databases can
inherent characteristics: face challenges when running on clusters,
Horizontal Scalability: NoSQL databases primarily due to their design principles:
are built to scale out by adding more Vertical Scalability: SQL databases
servers to handle increased load, rather typically scale vertically, meaning they
than scaling up by adding more power to a improve performance by adding more
single server. This is known as horizontal resources (CPU, memory) to a single
scalability. server rather than distributing the load
Distributed Architecture: Many NoSQL across multiple servers.
databases are designed with a distributed ACID Transactions: SQL databases
architecture from the ground up. Data is prioritize ACID (Atomicity, Consistency,
spread across multiple nodes, and the Isolation, Durability) transactions to
system can continue operating even if ensure data integrity and consistency.
some nodes fail. Ensuring these properties across a
distributed system is complex and can
introduce significant overhead.
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Types One type (SQL database) Many different types including
with minor variations key-value stores, document
databases, wide-column
stores, and graph databases
Development Developed in 1970s to deal Developed in 2000s to deal
History with first wave of data storage with limitations of SQL
applications databases, particularly
concerning scale, replication
and unstructured data
storage
Examples MySQL, Postgres, Oracle MongoDB, Cassandra,
Database HBase, Neo4j
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Data Storage Individual records (e.g., Varies based on NoSQL
Model "employees") are stored as rows in database type. For example, key-
tables, with each column storing a value stores function similarly to
specific piece of data about that SQL databases, but have only
record (e.g., "manager," "date two columns ("key" and "value"),
hired," etc.), much like a with more complex information
spreadsheet. Separate data types sometimes stored within the
are stored in separate tables, and "value" columns. Document
then joined together when more databases do away with the
complex queries are executed. For table-and-row model altogether,
example, "offices" might be stored in storing all relevant data
one table, and "employees" in together in single "document"
another. When a user wants to find in JSON, XML, or another
the work address of an employee, format, which can nest values
the database engine joins the hierarchically.
"employee" and "office" tables
together to get all the information
necessary.
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Schemas Structure and data types are Typically dynamic. Records can
fixed in advance. To store add new information on the fly,
information about a new data and unlike SQL table rows,
item, the entire database must be dissimilar data can be stored
altered, during which time the together as necessary. For some
database must be taken offline. databases (e.g., wide-column
stores), it is somewhat more
challenging to add new fields
dynamically.
Scaling SQL databases typically scale Horizontal Scalability: NoSQL
vertically, meaning they improve databases are built to scale out
performance by adding more by adding more servers to handle
resources (CPU, memory) to a increased load, rather than
single server rather than scaling up by adding more power
distributing the load across to a single server. This is known
multiple servers as horizontal scalability.
NoSQL vs. SQL Comparison
SQL Databases NoSQL Databases
Development Mix of open-source (e.g., Open-source
Model Postgres, MySQL) and closed
source (e.g., Oracle Database)
Supports Yes, updates can be configured In certain circumstances and at
Transactions to complete entirely or not at all certain levels (e.g., document
level vs. database level)
Data Specific language using Select, Through object-oriented APIs
Manipulation Insert, and Update statements,
e.g. SELECT fields FROM table
WHERE…
Consistency Can be configured for strong Depends on product. Some
consistency provide strong consistency (e.g.,
MongoDB) whereas others offer
eventual consistency (e.g.,
Cassandra)
Four types of NoSQL data model
• Key-value
• Document
• Column-family
• Graph
Relational database: order example
Key-Value Stores:
Structure: Stores data as key-value pairs.
Examples: Redis, DynamoDB, Riak.
Key-value
• Both a key and a value
are stored – in the case
of the order example,
everything to do with that
one order is simply stored
as one value
Key-value
• A key value store is primarily used when all access to
the database is via the primary key
• The Key/value model is the simplest and easiest to
implement. However, it is inefficient when you are only
interested in querying or updating part of a value.
Typical applications: Session storage, caching, user profiles,
preference management.
Data model: Collection of key-value pairs
Strengths: Simplicity - Very straightforward to use with a
simple data model (keys and values).
Weaknesses: Limited Query Capabilities - Lacks advanced
querying capabilities found in other data models
(e.g., complex joins, aggregations).
1. Session Storage
Example: Web Session Data
A key-value store like Redis can be used to
store session data, where the key is a
session ID and the value is a JSON object
containing session details.
Schema:
Key: session_id
Value: JSON object with session data
2. Caching
Example: Frequently Accessed Data
A key-value store can be used to cache
frequently accessed data, such as the
results of expensive database queries.
Schema:
Key: query_result_key
Value: Cached query result
3. User Profiles
Example: User Information Storage
A key-value store can be used to store user
profiles, where the key is a user ID and the
value is a JSON object containing user
information.
Schema:
Key: user_id
Value: JSON object with user profile data
Document
• Stores data as documents
(typically JSON or XML).
• Data is stored in documents
– in the order example,
everything to do with one
order is stored together in
one document
• Data entries are labelled
(e.g. customer id, quantity)
Examples: MongoDB,
CouchDB, RavenDB.
Document
• Documents are self-describing, hierarchical tree
structures which allow nested values associated with
each key.
• Document databases support querying more efficiently.
Typical applications: Content management systems, catalogs, user-
generated content.
Data model: Documents (with multiple values in a document)
Strengths: Flexible Schema - Allows for dynamic and flexible data
structures, making it easy to evolve the data model.
Weaknesses: Performance - Can suffer from performance issues
with very large documents or highly nested structures.
1. Content Management Systems (CMS)
Example: Article Storage
A document-based NoSQL database like MongoDB can be used to
store articles in a CMS, where each document represents an article
with various attributes.
Schema:
Collection: articles
Document Fields: title, author, content, tags, published_date
{
"_id": "article123",
"title": "How to Use MongoDB for CMS",
"author": "John Doe",
"content": "This article explains how to use MongoDB for content management
systems...",
"tags": ["MongoDB", "CMS", "NoSQL"],
"published_date": "2024-06-03T12:00:00Z"
}
{
"_id": "article456",
"title": "Best Practices for NoSQL Databases",
"author": "Jane Smith",
"content": "In this article, we discuss the best practices for using NoSQL databases...",
"tags": ["NoSQL", "Database", "Best Practices"],
"published_date": "2024-06-02T11:00:00Z"
}
2. Catalogs
Example: Product Catalog
A document-based NoSQL database can be used to store product catalogs, where each
document represents a product with various attributes.
Schema:
Collection: products
Document Fields: name, description, price, category, stock
{
"_id": "product123",
"name": "Laptop",
"description": "High-performance laptop with 16GB RAM and 512GB SSD.",
"price": 999.99,
"category": "Electronics",
"stock": 50
}
{
"_id": "product456",
"name": "Smartphone",
"description": "Latest model smartphone with advanced features.",
"price": 699.99,
"category": "Electronics",
"stock": 150
}
3. User-Generated Content
Example: Blog Posts
A document-based NoSQL database can be used to store user-generated blog posts,
where each document represents a blog post with various attributes.
Schema:
Collection: blog_posts
Document Fields: title, author, content, comments, posted_date
Sample data
{
"_id": "post123",
"title": "My First Blog Post",
"author": "user123",
"content": "This is the content of my first blog post...",
"comments": [
{"user": "user456", "comment": "Great post!", "date": "2024-06-03T13:00:00Z"},
{"user": "user789", "comment": "Thanks for sharing!", "date": "2024-06-03T14:00:00Z"}
],
"posted_date": "2024-06-03T12:00:00Z"
}
Column-family
• Stores data in columns
rather than rows. Data is
stored in column families.
• Data is stored with keys that
are linked to groups of
column (attributes) – for
example, a group or column
family that stores customer
details, another that stores
orders for that customer,
etc.
• Everything about one order
Column-family
• Column family stores allow you to store data with keys
mapped to values and the value grouped into multiple
column families, each column family being a map of
data.
Typical applications: Time-series data, data warehousing, logging.
Data model: Columns – column families
Strengths: Scalability - Highly scalable and can handle large
amounts of data distributed across many servers.
Weaknesses: Complex Data Model - More complex to design and
manage compared to key-value and document stores.
Exampels
• Apache Cassandra, HBase, ScyllaDB.
1. Time-Series Data
Example: Sensor Data Collection
A column-family NoSQL database like Apache Cassandra can be used to
store sensor data, where each row represents a different timestamp, and
each column within that row contains sensor readings.
Schema:
Row Key: sensor_id + timestamp
Columns: temperature, humidity, pressure
Row Key: sensor1_20230603_120000 Columns: temperature: 22.5 humidity: 55.3
pressure: 1012.8 Row Key: sensor1_20230603_121000 Columns: temperature: 22.6
humidity: 55.1 pressure: 1012.6
2. Data Warehousing Sample Data:
Example: Sales Data Storage
A column-family NoSQL database can be
used to store large amounts of sales data,
where each row represents a different
product, and each column represents sales
data from different regions or time
periods.
Schema:
Row Key: product_id
Columns: sales_q1, sales_q2, sales_q3,
sales_q4
3. Logging
Example: Application Log Storage
A column-family NoSQL database can be
used to store application logs, where each
row represents a different log entry and
each column provides details about the
log event.
Schema:
Row Key: log_id + timestamp
Columns: log_level, message, user_id
Graph databases
• Graph databases
organize data into node
and edge graphs; they
work best for data that
has complex relationship
structures
• Examples: Neo4j,
ArangoDB, Amazon
Neptune.
Graph
• Graph databases allow you to store entities
and relationships between these entities.
• Entities are also known as nodes, which have
properties. Think of a node as an instance of an object
in the application.
• Relations are known as edges that can also have
properties. Nodes are organised by relationships which
allow you to find interesting patterns between the nodes.
• The organisation of the graph lets the data be stored
just once and then interpreted in different ways based
on relationships.
Graph
Typical applications: Social networks, recommendation
engines, fraud detection.
Data model: Graph (nodes and edges)
Strengths: Relationships - Excellent for
handling complex relationships
between data entities.
Weaknesses: Scalability - Horizontal scalability
can be challenging compared to
other NoSQL models, especially
for very large datasets.
1. Social Networks
Example: User Relationships
A graph NoSQL database like Neo4j can
model social networks by representing
users as nodes and their relationships
(e.g., friendships) as edges.
Schema:
Nodes: User
Edges: FRIENDS_WITH
2. Recommendation Engines
Example: Product Recommendations
A graph NoSQL database can be used to
store and query product
recommendations, where users and
products are nodes, and relationships
indicate purchase history or interest.
Schema:
Nodes: User, Product
Edges: PURCHASED, INTERESTED_IN
3. Fraud Detection Nodes:
Example: Transaction Monitoring Transaction: {id: "trans001", amount:
500.00, timestamp: "2024-06-
A graph NoSQL database can be used to 03T12:00:00Z", location: "New York"}
detect fraud by modeling transactions as Transaction: {id: "trans002", amount:
nodes and relationships between them to 1500.00, timestamp: "2024-06-
identify patterns of suspicious behavior. 03T12:10:00Z", location: "London"}
User: {id: "user123", name: "John Doe"}
Schema:
Edges:
Nodes: Transaction, User (trans001)-[MADE_BY]->(user123)
Edges: MADE_BY, CONNECTED_TO (trans002)-[MADE_BY]->(user123)
(trans001)-[CONNECTED_TO]->(trans002)
Types of NoSQL databases
Reasons to use NoSQL
1. To improve programmer productivity by using a database
that better matches an application's needs.
• e.g. removing impedance mismatch by storing objects
together in aggregates rather than splitting them up
into relational tables
2. To improve data access performance via some
combination of handling larger data volumes,
reducing latency, and improving throughput.
• When a database is large enough to be split over
several database servers, NoSQL may be a
good option
Reasons to stick with Relational DBs
• They are well-known, therefore it is easier to find
people with experience of using them
• The technology is more mature and less likely to
encounter problems
• Many other tools are built on relational technology
"A DBA walks into a NOSQL
bar, but turns and leaves
because he couldn't find a
table"
Polyglot persistence
• Polyglot: the ability to speak multiple languages
• It is predicted that in the future, developers will make
use of a range of different technologies for the
persistence (storage) of data
• Relational databases and types of NoSQL databases
can be utilised as necessary to solve the particular
problems to which they are best suited