NoSQL Data Management
Submitted To : Submitted By :
Mr. Aravendra Sharma Dileep Singh
IMS (1900360149021)
CONTENT
Master Slave Replication
Peer-Peer Replication
Sharing and Replication
Consistency
Version Stamps
Map Reduce
Partitioning and Combining
Masters Slave Replication
Master
is the authoritative source for the data
is responsible for processing any updates to that data
can be appointed manually or automatically
Slaves
A replication process synchronizes the slaves with the master
After a failure of the master, a slave can be appointed as new master very quickly
Pros and cons of Master-Slave Replication
Pros
More read requests:
Add more slave nodes
Ensure that all read requests are routed to the slaves
Should the master fail, the slaves can still handle read requests
Good for datasets with a read-intensive dataset
Pros and cons of Master-Slave Replication
Cons:
Limited by its ability to process updates and to pass those updates on
Its failure does eliminate the ability to handle writes until:
the master is restored or a new master is appointed
Inconsistency due to slow propagation of changes to the slaves
Bad for datasets with heavy write traffic.
Peer-to-Peer Replication
All the replicas have equal weight, they can all
accept writes.
The loss of any of them doesn’t prevent access to the
data store.
Pros and cons of peer-to-peer replication
Pros:
you can ride over node failures without losing access to data
you can easily add nodes to improve your performance
Cons:
Inconsistency!
Slow propagation of changes to copies on different nodes
Inconsistencies on read lead to problems but are relatively transient
Two people can update different copies of the same record stored on different nodes
at the same time - a write-write conflict.
Sharing & Replication
Sharing is the process of storing data records across multiple machines. It
provides support to meet the demands of data growth.
It is not replication of data, but amassing different data from different machines.
Sharing allows horizontal scaling of data stored in multiple shards. With Sharing,
we can add more machines to meet the demands of growing data and the demands
of read and write operations.
The more machines you add, the more read and write operations your database
can support.
Consistency
The consistency property of a database means that once data is written to a
database successfully, queries that follow are able to access the data and get a
consistent view of the data.
In practical, this means that if you write a record to a database and then
immediately request that record, you’re guaranteed to see it. It’s particularly
useful for things like Amazon orders and bank transfers.
Consistency in database systems refers to the requirement that any given database
transaction must change affected data only in allowed ways.
Version Stamps
Version stamps help you detect concurrency conflicts. When you read data, then
update it, you can check the version stamp to ensure nobody updated the data
between your read and write.
Version stamps can be implemented using counters, content hashes, timestamps,
or a combination of these.
With distributed systems, a vector of version stamps allows you to detect when
different nodes have conflicting updates.
Map Reduce
The map task reads data from an aggregate and boils it down to relevant key-
value pairs. Maps only read a single record at a time and can thus be parallelized
and run on the node that stores the record.
Reduce tasks take many values for a single key output from map tasks and
summarize them into a single output. Each reducer operates on the result of a
single key, so it can be parallelized by key.
Reducers that have the same form for input and output can be combined into
pipelines. This improves parallelism and reduces the amount of data to be
transferred.
Map-reduce operations can be composed into pipelines where the output of one
reduce is the input to another operation's map.
Partitioning & Combining
Partitioner:
A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data
using a user-defined condition, which works like a hash function.
The total number of partitions is same as the number of Reducer tasks for the job.
Combiner:
The Combiner class is used in between the Map class and the Reduce class to reduce the
volume of data transfer between Map and Reduce.
Usually, the output of the map task is large and the data transferred to the reduce task is high.
Thank you