0% found this document useful (0 votes)

6 views64 pages

NoSQL Module1 PPT

Uploaded by

vvce22cse0131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views64 pages

NoSQL Module1 PPT

Uploaded by

vvce22cse0131

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

NoSQL Database - Module

Overview
Module 1 – Introduction & Storage Architectures (8 hrs) Outcome:
Students will understand the need for NoSQL, the types of NoSQL DBs,
and the internal storage mechanisms

WHAT NOSQL IS, TYPES OF NOSQL: STORAGE DOCUMENT STORE

HISTORY, WHY IT’S KEY/VALUE, ARCHITECTURES: INTERNALS
NEEDED (BIG DATA, DOCUMENT, GRAPH, COLUMN-ORIENTED (MONGODB
SCALABILITY) COLUMN-ORIENTED DATABASES (HBASE) COLLECTIONS,
INDEXES,
RELIABILITY,
SCALING)

KEY/VALUE STORES CONSISTENCY MODELS:

EVENTUALLY CONSISTENT
(REDIS, DBS, CONSISTENT
MEMCACHED HASHING, GOSSIP
INTERNALS) PROTOCOLS, HINTED
HANDOFF
Module 2 – Indexing & Special Collections (8 hrs)
Outcome: Students will learn how to optimize data access using
indexes and how MongoDB handles special storage needs.

Indexing in MongoDB: Compound indexes, $-operators, cardinality, query

optimizer

Unique, sparse indexes, index administration

Special collections & indexes: Capped collections, tailable cursors, TTL indexes

Full-text search, multilingual search, geospatial indexes (2D, 2DSphere)

GridFS (file storage in MongoDB)

Module 3 – Aggregation & Application Design (8 hrs) Outcome: Students
will be able to perform analytics queries on NoSQL data and design
applications with efficient schemas.

Aggregation framework
in MongoDB: $match,
Aggregation commands
$project, $group, MapReduce examples
(count, distinct, group)
$unwind, $sort, $limit,
$skip

Application design: Cardinality

Optimizations for data
Normalization vs (friends/followers
manipulation
denormalization example)

Schema planning,
managing consistency,
schema migration
Module 4 – Sharding (Scaling Out in MongoDB) (8 hrs)

Outcome: Students will understand how MongoDB handles

massive data by distributing it and how to choose shard keys wisely.

Sharding basics: Components of

a cluster, test setup

Configuring sharding: mongos,

config servers, adding shards,
chunk splitting, balancing

Choosing a shard key:

Strategies (ascending, random,
hashed, location-based)

Rules & limitations

Multi-database/collection
clusters and manual sharding
Module 5 – Transactions, Transactions & integrity:
RDBMS ACID vs CAP theorem trade-offs
Consistency & distributed ACID

Performance Tuning (8
hrs)
Performance tuning:
Consistency in MongoDB, Reducing latency,
CouchDB, Cassandra, increasing throughput,
Membase scalability laws (Amdahl’s
Outcome: Students will Law, Little’s Law)

be able to analyze trade-

offs in consistency models
and apply performance Partitioning, scheduling,
communication overhead, HBase coprocessors,

optimization techniques.
compression, MapReduce Bloom filters
tuning
• NOSQL: What It Is
and Why You Need It

•Definition: NoSQL is literally a

• Defining NoSQL combination of two words: No
Module 1 • Setting context by
and SQL. The implication is that
NoSQL is a technology or
explaining the history product that counters SQL.
of NoSQL’s •This means NoSQL is not a
emergence
single product or even a single
• Introducing the technology. It represents a class
NoSQL variants of products and a collection of
diverse, and sometimes related,
• Listing a few popular
concepts about data storage
NoSQL products
and manipulation.
Challenges of RDBMS

Rigid Schema Assumption: RDBMS expects a fixed schema

(tables, columns).

Dense & Uniform Data Assumption: Works best when data is

consistent and structured.

Index Dependence: Relational queries rely heavily on

indexes.

Scalability Issues: Vertical scaling (bigger server) is expensive;

horizontal scaling (many servers) is hard for RDBMS.
Workarounds Break the Model:To scale, RDBMS uses denormalization,
drops constraints, and relaxes ACID — but then it starts to behave like
NoSQL.
• Bit of History
– Non-relational databases are not new
• Even before SQL and relational databases became popular, there were non-
relational data storage methods.
• Example: In mainframes, data was stored in hierarchical or network-based
structures (like IBM IMS in the 1960s).

– They existed in specialized domains

• For example, LDAP (Lightweight Directory Access Protocol) uses a hierarchical
directory database for storing authentication and authorization credentials.
• These were non-relational, but used for specific, narrow purposes (not
general-purpose like SQL).

– Rooted in distributed & parallel computing

• Modern NoSQL systems are designed to work across many servers (clusters)
rather than a single big machine.
• They use parallelism and distribution to handle huge volumes of data and
millions of users simultaneously.
• Big Data
SI standards
Manufacturers standard

The report claims that the total size of digital data created and replicated will
grow to 35 zettabytes by 2020.
• Challenges of Big data
Efficiently storing and accessing large amounts of data is
difficult. The additional demands of fault tolerance and
backups makes things even more complicated.
Manipulating large data sets involves running
immensely parallel processes. Gracefully recovering
from any failures during such a run and providing results
in a reasonably short period of time is complex.
Managing the continuously evolving schema and
metadata for semi-structured and un-structured data,
generated by diverse sources, is a convoluted problem.
• Scalability
– Vertical scaling
– Horizontal scaling
The MapReduce model possibly provides one of the best
possible methods to process large-scale data on a horizontal
cluster of machines.

MapReduce is a programming model (introduced by Google)

for processing and analyzing large datasets in a distributed
environment.
• It splits a task into two major steps:
• Map → Break data into smaller chunks and process them in
parallel.
• Reduce → Combine (aggregate) the results from all the
chunks into a final answer.
• Map Reduce Architecture
• How it Works (Step-by-Step)
– Input data is split into chunks and distributed across
many computers (nodes).
– Map function processes each chunk independently
(parallel processing).
– Intermediate results are grouped by key.
– Reduce function aggregates the grouped results into
a final output
Application

•Hadoop MapReduce → early big data framework by Apache.

•Search engines (Google) → used for indexing web pages.
•Log analysis, recommendation engines, fraud detection.
Case study 1: Multiply by 2 function
Step 1: Read input
input_list = [1, 2, 3, 4]

Step 2: Map (Transform Each Item)

map(multiply_by_two, input_list) → [2, 4, 6, 8]

Step 3:Reduce (Aggregate Results)

reduce(sum, [2, 4, 6, 8]) → 20

Step 4: Output
sum=20
Functional programming idea of map and reduce is adapted to key/value data
Case study 2: Say you want to count how many times each word appears in a huge
set of documents.
1.Input
•Documents: "hello world", "hello NoSQL", "hello students"
2.Map phase (break text into key–value pairs)
•"hello world" → [(hello, 1), (world, 1)]
•"hello NoSQL" → [(hello, 1), (NoSQL, 1)]
•"hello students" → [(hello, 1), (students, 1)]
3.Shuffle & Group (system groups by key)
•hello → [1, 1, 1]
•world → [1]
•NoSQL → [1]
•students → [1]
4.Reduce phase (aggregate values)
•hello → 3
•world → 1
•NoSQL → 1
•students → 1
Final Output:
hello: 3 world: 1 NoSQL: 1 students: 1
Case study 3: People by Zip Code
Step1: Input (Key/Value Pairs)
[
{"94303": "Tom"},
{"94303": "Jane"},
{"94301": "Arun"},
{"94302": "Chen"}
]
Step 2: Map Function- Group people by zip code.
[
{"94303": ["Tom", "Jane"]},
{"94301": ["Arun"]},
{"94302": ["Chen"]}
]
Step 3: Reduce Function- Apply aggregation (count names per zip code).
[
{"94303": 2},
{"94301": 1},
{"94302": 1}
Why types of NoSQL databases
• Not all data fits neatly into rows and columns (RDBMS model).
Today’s apps handle structured, semi-structured, and unstructured
data. NoSQL provides the right tool for the right job instead of
forcing everything into tables.

• Real-world applications need different data models:

– Key/Value → Simple, fast lookups (e.g., caching, sessions).
– Document → JSON-like flexible data (e.g., user profiles, product catalogs).
– Column-family → Huge datasets with sparse columns (e.g., analytics,
logs).
– Graph → Highly connected data (e.g., social networks, recommendations).

• Each type solves specific problems better than a “one-size-fits-all”

solution.
1) What are Column-Oriented Stores?
• A type of NoSQL database that stores data by
columns instead of rows.
• Inspired by Google Bigtable.
• Data organized using Row Keys and Column
Families.
• Efficient for large, sparse datasets.
Data Model • Stored as: (Row Key, Column Family:Column
Qualifier, Timestamp) → Value
Example
• Column Families group
related columns
• Sparse storage (no empty
values stored)
• Flexible schema (different
Key rows may have different
Features columns)
• Time-stamped versions
(history tracking)
• Horizontal scalability
(petabytes across servers)
Examples of Column-Oriented Stores

- Google Bigtable (proprietary)

- Apache HBase (open-source on Hadoop)

- Hypertable (C++ implementation)

- Cassandra (distributed wide-column store)

- ScyllaDB (C++ high-performance Cassandra clone)

Use Cases
Large-scale analytics
Time-series data
(web logs,
(IoT, sensor data)
clickstream)

Search indexes User profiles (social

(inverted index) networks)

Metadata storage
for big data
platforms
Row-Oriented vs Column-Oriented
2) Key/Value Store in NoSQL
Concept
• The simplest type of NoSQL
database.
• Stores data as a collection of key–
value pairs, similar to a dictionary
or hash map.
• Each key is unique and retrieves
its associated value.
• Values can be strings, numbers,
JSON, or even binary data
(images, files).
• Key → Value
• user:101 → {name: "Alice",
age: 23, city: "New York"}
• user:102 → {name: "Bob", age:
30, city: "London"}
Key Features
• High-speed lookups (direct access by key)
• Schema-free and flexible values
• Horizontally scalable
• Great for caching & session storage
Popular Databases

• Redis → In-memory, caching & queues

• Amazon DynamoDB → Managed
key/value & document store
• Riak KV → Distributed, fault-tolerant
• Memcached → In-memory caching
system
• Membase
• Kyoto Cabinet
• Cassandra implemented in JAVA
• VoldemortImplemented in Erlang.
Also, uses a bit of C and JavaScript
Use Cases

Caching frequently accessed data

User sessions (token storage)

Shopping carts in e-commerce

Leaderboards in gaming apps

IoT data ingestion

Document databases

What are Document Databases?

• A NoSQL database designed to
store, retrieve, and manage
semi-structured data as
documents.
• Documents are usually stored
in formats like JSON, BSON, or
XML.
• Unlike RDBMS tables with rows
& columns, documents are
flexible and can have different
structures.
Example: A collection of users
Example: one document {
unit "_id": "user_101",
"name": "Alice",
"email": "[email protected]",
{ "age": 23,
"id": "101", "interests": ["reading", "traveling"]
"name": "Alice", }
"email":
"[email protected]", {
"skills": ["Python", "_id": "user_102",
"Django"] "name": "Bob",
} "email": "[email protected]",
"skills": ["Python", "Django"]
}
• Key Features
Schema-less → No predefined schema
needed.
Hierarchical documents → Supports
nested structures (arrays, sub-documents).
Indexing & querying → Index on fields
inside documents.
Horizontal scaling → Distribute documents
across clusters.
Rich queries → Support for filtering,
searching, aggregations.
Popular Document Databases

MongoDB → Most popular, JSON/BSON-based.

CouchDB → Uses JSON to store data & JavaScript

for queries.
RavenDB → .NET-based document database.

Amazon DocumentDB → Managed MongoDB-

compatible database.
Use Cases

Content Management Systems (CMS) → Blogs, product catalogs.

User profiles & personalization → Flexible user attributes.

E-commerce applications → Products with varying attributes.

IoT data storage → Semi-structured sensor data.

Mobile/web apps → Fast, flexible backend storage.

Difference between RBMS and Document DB

Feature RDBMS (SQL) Document DB (NoSQL)

Tables (rows &

Data model Documents (JSON-like)
columns)
Schema Fixed, rigid Flexible, schema-less
Embedded/nested
Relationships Normalized (JOINs)
documents
JSON-like query
Query language SQL
(MongoDB, etc.)
Horizontal (sharding,
Scaling Vertical (limited)
clusters)
Graph Database

What is a Graph Database?

• A NoSQL database that uses graph
structures (nodes, edges,
properties) to represent and store
data.
• Instead of rows/columns (RDBMS)
or documents (MongoDB), data is
represented as a graph.
• Best suited for highly connected
data (like social networks,
recommendations, fraud detection).
• Core Components
Nodes → Represent entities (e.g., Person, Product, Location).
Edges → Represent relationships between nodes (e.g., FRIEND_OF, PURCHASED,
LOCATED_IN).
Properties → Attributes for nodes/edges.
Example:
• (Alice) -[FRIEND_OF]-> (Bob)
• (Bob) -[WORKS_AT]-> (Company X)
Example Data (Social Network):
• Key Features
Schema-free → Flexible, evolving structures.
Efficient for relationships → Directly stores and
queries connections.
Query languages → Cypher (Neo4j), Gremlin, SPARQL.
Traversals → Fast navigation of relationships (friends of
friends, etc.).
Great for graph algorithms (shortest path, centrality,
community detection).
• Popular Graph Databases

Neo4j → Most widely used, Cypher query language.

Amazon Neptune → AWS graph DB (supports Gremlin & SPARQL).
OrientDB → Multi-model DB (graph + document).
ArangoDB → Graph + document + key/value.
• Use Cases

Social Networks → Friend recommendations, follower graphs.

Recommendation Systems → “People who bought X also bought Y.”
Fraud Detection → Identify suspicious connections across transactions.
Knowledge Graphs → Google’s Knowledge Graph, semantic search.
Network/IT Operations → Mapping dependencies in systems.
Working with column-oriented database
Using Tables and Columns in Relational
Databases (RDBMS)
Contrasting Column Databases with RDBMS
Distributed system: Horizontal scaling
Column family indentifier

Data Evolution is recorded

A single table spans multiple machines in a column-
oriented database like Cassandra, HBase, or Bigtable

1. Why One Machine Isn’t Enough

 In huge databases, a table can grow to
billions of rows and millions of columns.
 One physical machine cannot store or process
this much data (limitations of storage,
memory, CPU).
 So, the system automatically splits the table
into smaller pieces and distributes them
across multiple servers.

2. How Splitting Works

 The row key uniquely identifies each row.
 Rows are stored in sorted order of row keys.
 The table is divided into bundles (HBase),
partitions (Cassandra), or regions (Bigtable).
Example: Hbase
In HBase, a table is divided into regions based on row key ranges.
Each region is hosted on a different RegionServer (machine).
As data grows, regions split into smaller ranges and migrate to
other servers for load balancing.
•Table Customer with row keys from A → Z

Region 1: A–H → stored on Machine 1

Region 2: I–P → stored on Machine 2
Region 3: Q–Z → stored on Machine 3
Column Databases as Nested Maps of Key/Value
Pairs
Unlike relational databases (which store data in rows), column databases store
data in columns grouped into families.
But internally, they don’t use fixed schemas — they treat data like flexible key-
value maps.

Table = { RowKey → { ColumnFamily → { ColumnName → Value } } }

1. Row Key
1. Unique identifier for each row (like (User ID=“User101”)
2. This is the outermost key in the map.
2. Column Family
• A logical grouping of columns (like personsInfo, Orders)
• Acts like a sub-map inside each row.
3. Column Namevalue
• Each column inside the family is itself a key/value pair.
• Example: Name”Alice”, Age25
Example: Imagine a table of users in Cassandra

UserID Name Age City

User101 Alice 25 London
User102 Bob 30 Paris
• Analogy
Think of a column database like a bookshelf:
Shelf = Row Key (User)
Book = Column Family (Personal Info, Orders, etc.)
Pages in the Book = Column → Value pairs (Name=Alice,
Age=25, …)
Technical example: Webtable
HBase Distributed storage architecture
Analogy: Imagine a library:
• The table = Library
• Regions = Different floors of the library (each
stores certain row ranges).
• Column families = Sections in each floor (e.g.,
Fiction, Science).
• Stores = Bookcases inside each section.
• Physical files = Actual books stored in the library.
• Thin wrapper = A librarian who helps you find
books instead of you searching raw shelves.
Document internal storage
MongoDB stores data in documents (like JSON objects), not in rows
and columns like traditional databases.
Collections:
A collection is like a table in relational databases.
Example 1: Collection :Students
Documentation inside:
{ "id": 1, "name": "Alice", "age": 21 }
{ "id": 2, "name": "Bob", "age": 22 }
Example 2:
{ "id": 1, "name": "Alice", "age": 21 }
{ "id": 2, "name": "Bob", "email": "[email protected]" }
Namespaces: Collection can be separated using namespaces (like
database+collection name , e.g: school.students)
Unique identifier(_id) :
• What is a Memory-Mapped File?
 A memory-mapped file is a way to make a file on disk behave like it is part of your
computer’s memory (RAM).
 Instead of reading/writing to the file using system calls (which are slow), the file is
mapped into virtual memory.
 This means: your program can read/write the file as if it’s just an array in memory.

• Why is it Fast?
 Normally:
 To read from a file → app → system call → disk → OS → back to app (lots of
steps).
 With memory mapping:
 File contents are directly available in memory space of the program.
 No repeated system calls.
 Since memory access is way faster than disk access, I/O performance improves.
• Kernel’s Role
 The operating system’s kernel manages this memory mapping and page cache.
 It automatically keeps the file content in sync between disk and RAM.
 So, applications don’t have to worry about the details—they just access memory
normally.
• import mmap

# Step 1: Create a sample file

with open("example.txt", "wb") as f:
f.write(b"Hello MongoDB with memory-mapped files!")

# Step 2: Open the file for reading and writing

with open("example.txt", "r+b") as f:
# Step 3: Memory-map the file
mm = mmap.mmap(f.fileno(), 0)

# Step 4: Read from memory (just like reading from RAM)

print("Original content:", mm[:].decode("utf-8"))

# Step 5: Modify the file content via memory

mm[6:12] = b"Python" # Replacing 'MongoD' with 'Python'

# Step 6: Move file pointer and read again

mm.seek(0)
print("Modified content:", mm.read().decode("utf-8"))

# Step 7: Close the mapping

• MongoDB’s Memory-Mapped Storage Strategy
 Earlier versions of MongoDB (before WiredTiger became the default engine) relied heavily on
memory-mapped files to store data.
This has pros (speed) but also as some side effects.

(a) No Separation Between OS Cache & DB Cache

• In some databases (like Oracle, MySQL), there’s a database-managed cache and the OS has its own
cache.
• In MongoDB’s memory-mapped strategy:
– The OS cache = DB cache.
– No duplicate copies, so less redundancy.
• Advantage: Efficient (no wasted memory).
• Disadvantage: MongoDB loses fine-grained control (depends on OS).
(b) Cache Management Controlled by OS
• Since memory-mapped files rely on virtual memory, the OS decides:
– Which data pages to keep in cache.
– Which pages to evict (remove).
• Problem: Different OS (Linux, Windows, macOS) may behave differently.
• So, MongoDB performance can vary across platforms.
(c) MongoDB Can Use All Available Memory
• With memory mapping, MongoDB automatically expands to use as much RAM as available.
• No special tuning needed.
• If you add more RAM, MongoDB cache effectively grows larger → performance boost.
Some other limitation of memory mapping
• On 32-bit systems → MongoDB can only handle 2 GB database size (because of
memory addressing limits).
• On 64-bit systems → This restriction is removed → databases can be much larger.
• Each document in MongoDB can be at most 8 MiB.
• Why? Because documents are designed to be lightweight and fast to query.
• If you need to store files larger than 8 MiB (e.g., images, videos), use GridFS.
• GridFS breaks the file into smaller chunks and stores them across multiple documents.

Namespace limit
• In MongoDB, a namespace is just the unique identifier string that points to
a collection or an index inside a database.
• MongoDB indexes are implemented as B-trees
• Reminder:
• 1 collection = 1 namespace
• 1 index = 1 namespace
• Example: If each collection has 2 indexes →
– Each collection = 3 namespaces (1 + 2).
• Namespace storage File (.ns File)
– MongoDB stores namespace metadata in a file called
<dbname>.ns. This file keeps track of collections and
indexes. Max size of .ns file = 2GB.
• Example: Database mydbfile is mydb.ns
• Guidelines for Using Collections and Indexes in
MongoDB
– Thumb rule: “Do I often need to query across all this data
together?”
– Capped collections: Follow LIFO technique.
– _id is always indexed. We can add our own index also.
Results come in _id order (or insertion order in capped
collections)
• MangoDB Reliability and durability
 Traditional databases (like MySQL, PostgreSQL) guarantee ACID properties
(Atomicity, Consistency, Isolation, Durability).
 MongoDB does not guarantee full ACID transactions (at least in older versions;
newer ones support multi-document transactions, but with overhead).
 So, in concurrent operations (multiple clients updating same data at once),
conflicts can occur.
 Some operations are atomic at the document level.
 Example: $inc, $set, $push
• Replication for safety
 To prevent data loss in failures, MongoDB supports replication.
 Replication is asynchronous → changes on master may take time to appear on
slaves.
 In older versions → master-slave replication:
 One master (primary) → handles reads/writes.
 One or more slaves (secondary) → keep a copy of master’s data.
 In the current versions of MongoDB, replica pairs of master and slave have
been replaced with replica sets, where three replicas are in a set. Replica
sets allow automatic recovery and automatic failover.
Why Sharding?
• MongoDB stores huge datasets that may not fit on one
server.
• Sharding = Horizontal Scaling (splitting collections across
servers).
• Example: Instead of 1 billion docs on one server → split
across 10 servers.
• Shard = Portion of collection stored on one machine.
• Shards are replicated for reliability (Replica Sets).
• Example: 4 shards × 3 replicas = 12 MongoDB servers.
• Each shard is divided into chunks.
• Chunk = continuous range of documents based on shard key.
• Identified by: minKey, maxKey, and collection.
• MongoDB auto-balances chunks across shards.
• Field(s) used to distribute data across shards.
• Can be single [{ userId: 1 }]or compound
field [{ state: 1, city: 1 }].

• Must be chosen carefully to ensure balanced

distribution.
• Bad shard key = unbalanced shards.
• Store metadata about shards, chunk
distribution, and shard keys.
• Replicated to avoid single point of failure.
• Understanding Key/Values stores
– Memcached
• Memcached is an open-source, high-performance,
distributed memory caching system.
• It is used to store data in RAM (memory) temporarily
to reduce the number of direct database or API calls.
• Think of it as a fast-access shortcut: instead of hitting
the database every time, apps check Memcached first.
• Key features:
• In-Memory Storage
• Key-Value Store
• Volatile (Non-persistent)
• Distributed System
• Simple Protocols: (set,get,delete)
• The heart of Memcached is its slab allocator, which helps manage memory
efficiently (instead of using traditional malloc/free).
1) Slabs
• Memory in Memcached is divided into slabs.
• Each slab is responsible for storing values of a specific size range.
• Example: one slab stores objects around 1 KB, another slab for 1.25 KB, etc.
2) Pages
• A slab is further divided into pages.
• Each page is 1 MB in size (default).
• Pages contain chunks (or buckets) where the actual objects are stored.
3) Chunks(Buckets)
• A chunk is the smallest unit of memory allocation inside Memcached.
• Each chunk can store one object (a value + metadata).
• Object placement rule:
– An object is stored in the closest larger chunk size.
– Example:
• Object size: 1.4 KB
• Next available chunk size: 1.5625 KB
• Object stored in that chunk → 0.1625 KB wasted
Redis :

• Redis (REmote DIctionary Server) is an open-source, in-memory

data store.
• It can be used as a database, cache, or message broker.
• Unlike traditional databases, Redis keeps data in RAM, which makes
it extremely fast.
• Key features of Redis:
– In-memory storage
– Key-value store: (keys are always string)
– Rich data structures: String,List,Set,Sorted Set,Hash,Streams, Bitmaps.
– Persistence storage :
• RDB (Redis Database Backup)
• AOF (Append-Only File
– Replication & High Availability: master-replica replication, automatic
failover, sharding across multiple nodes.
– Redis can be used as a message broker with publish/subscribe support.
Consistent hashing
Consistent Hashing is a smarter way of distributing
keys across servers that minimizes reassignments
when servers are added or removed.
• How it Works:
– Imagine a hash space arranged in a ring (circular
structure).
– Both servers and keys are hashed onto this ring.
– A key belongs to the first server clockwise from its
position on the ring.
– When a server is added or removed, only the keys
near that server’s position are reassigned, not all keys.
• Object Versioning:
• When multiple clients update the same object at
the same time, conflicts may occur.
To resolve this, systems use object versioning –
each object is given a version identifier whenever
it is modified.
• Instead of overwriting old values, the system
keeps track of different versions of the object.

Node A has performed 2 updates

Node B has performed 1 update

Unit 2
No ratings yet
Unit 2
18 pages
Unit 2
No ratings yet
Unit 2
25 pages
Unit Ii Nosql Data Management
No ratings yet
Unit Ii Nosql Data Management
26 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
Chapter1 NoSQL Databases
No ratings yet
Chapter1 NoSQL Databases
7 pages
DSA Notes Unit-03
No ratings yet
DSA Notes Unit-03
144 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
Module 2
No ratings yet
Module 2
42 pages
Lec 24
No ratings yet
Lec 24
16 pages
2.1 Nosql
No ratings yet
2.1 Nosql
25 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
B and B+ Tree
No ratings yet
B and B+ Tree
33 pages
NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
33 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
29 pages
NGD Unit 1-4
No ratings yet
NGD Unit 1-4
43 pages
NOSQL Databases
No ratings yet
NOSQL Databases
8 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
NoSQL DBs
No ratings yet
NoSQL DBs
46 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Unit Ii
No ratings yet
Unit Ii
70 pages
Man With A Child in His Eyes - Kate Bush
100% (2)
Man With A Child in His Eyes - Kate Bush
4 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
84 RMAN Quick Reference
No ratings yet
84 RMAN Quick Reference
28 pages
BDA Module 5 - Part1 (No SQL) 2023
No ratings yet
BDA Module 5 - Part1 (No SQL) 2023
32 pages
Unit 1 Mangodb
No ratings yet
Unit 1 Mangodb
57 pages
What Is Nosql Nodesc
No ratings yet
What Is Nosql Nodesc
17 pages
Unit 2
No ratings yet
Unit 2
41 pages
Unit-3 BDA
No ratings yet
Unit-3 BDA
21 pages
Nosql Notes
No ratings yet
Nosql Notes
9 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
DATA4
No ratings yet
DATA4
259 pages
Nosql
No ratings yet
Nosql
64 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
Unit 2 (Big Data Analytics)
No ratings yet
Unit 2 (Big Data Analytics)
11 pages
Nosql, Mongodb
No ratings yet
Nosql, Mongodb
18 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Enterprise Reporting Best Practices in An SAP Environment: White Paper
100% (1)
Enterprise Reporting Best Practices in An SAP Environment: White Paper
22 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
Bda 1
No ratings yet
Bda 1
23 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
No SQL
No ratings yet
No SQL
109 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
2 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
No SQL
No ratings yet
No SQL
12 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Nosql
No ratings yet
Nosql
20 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Big Data Tech: NoSQL & Hadoop
No ratings yet
Big Data Tech: NoSQL & Hadoop
16 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Unit 2
No ratings yet
Unit 2
23 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
MCQS Database Systems - For Computer Science - Set 12
No ratings yet
MCQS Database Systems - For Computer Science - Set 12
6 pages
BDA Assignment1 BE6 20
No ratings yet
BDA Assignment1 BE6 20
10 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Module 1
No ratings yet
Module 1
34 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
NoSQL and Distributed Computing
No ratings yet
NoSQL and Distributed Computing
36 pages
Collections
No ratings yet
Collections
26 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Database Revision
No ratings yet
Database Revision
10 pages
Log
No ratings yet
Log
178 pages
Database Bca
No ratings yet
Database Bca
148 pages
DataSnap Delphi XE: Secure Multi-Tier App
No ratings yet
DataSnap Delphi XE: Secure Multi-Tier App
13 pages
Re Factoring Databases Evolutionary Database Design
No ratings yet
Re Factoring Databases Evolutionary Database Design
25 pages
Power Query Editor Questions
No ratings yet
Power Query Editor Questions
1 page
Unit 1 Blockchain-1
No ratings yet
Unit 1 Blockchain-1
5 pages
IT Data Backup Essentials
No ratings yet
IT Data Backup Essentials
6 pages
Ghack Crypto Presentation NEC
No ratings yet
Ghack Crypto Presentation NEC
25 pages
Important Questions For Class 12 Computer Science (Python) - Structured Query Language - CBSE Tuts
No ratings yet
Important Questions For Class 12 Computer Science (Python) - Structured Query Language - CBSE Tuts
17 pages
Linked Lists: Concepts and Operations
No ratings yet
Linked Lists: Concepts and Operations
13 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
Report
No ratings yet
Report
4 pages
STK - Q Quiz
No ratings yet
STK - Q Quiz
21 pages
DBMS Unit5
No ratings yet
DBMS Unit5
20 pages
TVL CSS11 Q2 DW10
No ratings yet
TVL CSS11 Q2 DW10
4 pages
Dms Ch2 Solution-1
No ratings yet
Dms Ch2 Solution-1
21 pages
Oracle Data Pump (Expdp, Impdp) in Oracle Database 10g, 11g, 12c, 18c
No ratings yet
Oracle Data Pump (Expdp, Impdp) in Oracle Database 10g, 11g, 12c, 18c
20 pages
LTE Formula
No ratings yet
LTE Formula
22 pages
CS609-GDB No.1 Solution by M.junaid Qazi
No ratings yet
CS609-GDB No.1 Solution by M.junaid Qazi
2 pages
Spring Boot Security for Developers
No ratings yet
Spring Boot Security for Developers
8 pages
Azure Data
No ratings yet
Azure Data
6 pages

NoSQL Module1 PPT

Uploaded by

NoSQL Module1 PPT

Uploaded by

NoSQL Database - Module

WHAT NOSQL IS, TYPES OF NOSQL: STORAGE DOCUMENT STORE

KEY/VALUE STORES CONSISTENCY MODELS:

Indexing in MongoDB: Compound indexes, $-operators, cardinality, query

Unique, sparse indexes, index administration

Full-text search, multilingual search, geospatial indexes (2D, 2DSphere)

GridFS (file storage in MongoDB)

Application design: Cardinality

Outcome: Students will understand how MongoDB handles

Sharding basics: Components of

Configuring sharding: mongos,

Choosing a shard key:

Rules & limitations

be able to analyze trade-

•Definition: NoSQL is literally a

Rigid Schema Assumption: RDBMS expects a fixed schema

Dense & Uniform Data Assumption: Works best when data is

Index Dependence: Relational queries rely heavily on

Scalability Issues: Vertical scaling (bigger server) is expensive;

– They existed in specialized domains

– Rooted in distributed & parallel computing

MapReduce is a programming model (introduced by Google)

•Hadoop MapReduce → early big data framework by Apache.

Step 2: Map (Transform Each Item)

Step 3:Reduce (Aggregate Results)

• Real-world applications need different data models:

• Each type solves specific problems better than a “one-size-fits-all”

- Google Bigtable (proprietary)

- Apache HBase (open-source on Hadoop)

- Hypertable (C++ implementation)

- Cassandra (distributed wide-column store)

- ScyllaDB (C++ high-performance Cassandra clone)

Search indexes User profiles (social

• Redis → In-memory, caching & queues

Caching frequently accessed data

User sessions (token storage)

Shopping carts in e-commerce

Leaderboards in gaming apps

IoT data ingestion

What are Document Databases?

MongoDB → Most popular, JSON/BSON-based.

CouchDB → Uses JSON to store data & JavaScript

Amazon DocumentDB → Managed MongoDB-

Content Management Systems (CMS) → Blogs, product catalogs.

User profiles & personalization → Flexible user attributes.

E-commerce applications → Products with varying attributes.

IoT data storage → Semi-structured sensor data.

Mobile/web apps → Fast, flexible backend storage.

Feature RDBMS (SQL) Document DB (NoSQL)

Tables (rows &

What is a Graph Database?

Neo4j → Most widely used, Cypher query language.

Social Networks → Friend recommendations, follower graphs.

Data Evolution is recorded

1. Why One Machine Isn’t Enough

2. How Splitting Works

Region 1: A–H → stored on Machine 1

Table = { RowKey → { ColumnFamily → { ColumnName → Value } } }

UserID Name Age City

# Step 1: Create a sample file

# Step 2: Open the file for reading and writing

# Step 4: Read from memory (just like reading from RAM)

# Step 5: Modify the file content via memory

# Step 6: Move file pointer and read again

# Step 7: Close the mapping

(a) No Separation Between OS Cache & DB Cache

• Must be chosen carefully to ensure balanced

• Redis (REmote DIctionary Server) is an open-source, in-memory

Node A has performed 2 updates

You might also like