Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views8 pages

Bda CHP 3

Uploaded by

sp1670761
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views8 pages

Bda CHP 3

Uploaded by

sp1670761
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1)

2) CAP Theorem and How It Differs from ACID Property in Databases

According to the CAP theorem, a distributed data system cannot simultaneously provide
consistency, availability, and partition tolerance. It can only guarantee two of the three at any
given time.
CAP theorem some key terms:
Consistency: Consistency ensures that all nodes in a distributed system show the same view
of the data at all times. When data is updated or written, all subsequent read requests return
the latest version of the data.
Availability: Availability means the system is consistently operational and can quickly
handle read and write requests. It ensures users can access data and perform transactions even
when some nodes are down.
Partition Tolerance: Partition tolerance refers to the system’s ability to function despite
network partitions or disruptions in communication between different parts of the system. In
other words, a partition-tolerant system can handle network failures that cause
communication breakdowns between different parts of the system, without completely
ceasing operation.
Node: In a distributed system, a node is an individual server or instance that stores data and
performs operations such as read and write requests. Each node operates independently but
can communicate with other nodes through a network.
Cluster: In distributed systems, a cluster consists of multiple nodes working together to
provide efficient, reliable, and scalable services for applications and data management.
How CAP Theorem Differs from ACID Property
3)
4) Describe the four ways by which big data problems are handled by NoSQL.

Big data means datasets that are too large or complex for regular database tools to store and
analyze easily. As data continues to grow rapidly, it becomes important to find good ways to
process and use this large amount of data. To handle this challenge, new ideas, methods, tools,
and technologies are needed to turn big data into useful business insights.

NoSQL databases are great for managing big data because they have features designed to handle
large volumes of information efficiently. Some of the best NoSQL databases for big data are:
MongoDB
Cassandra
CouchDB
Neo4j

Different ways to handle Big Data problems:

1. The queries should be moved to the data rather than moving data to queries:
Instead of sending large amounts of data across the network to a central place for processing, it’s
smarter to send the query to the data where it is stored. This way, only the query and its results
travel over the network, making queries much faster because data stays at each node.

2. Hash rings should be used for even distribution of data:


It is hard to find a fair way to divide data among multiple nodes in a distributed database. Hash
rings use a random 40-character key to spread data evenly across many servers. This helps
balance the load on the network and storage evenly.

3. For scaling read requests, replication should be used:


Replication means keeping backup copies of data in real-time. This helps handle many read
requests by allowing multiple copies of data to be read from different nodes, improving speed
and availability.
4. Distribution of queries to nodes should be done by the database:
To get faster performance when queries need data from many nodes, the database itself should
manage sending parts of the query to the right places. This means the query moves to the data
instead of moving large amounts of data back and forth, improving efficiency.

5) Differentiate between traditional RDBMS and NoSQL.

Point Aspect RDBMS (Relational Database) NoSQL (Non-relational Database)


No.

Uses a fixed schema with Uses flexible or dynamic schemas


F Fixed Schema tables and columns (document, key-value, graph)

Uses SQL (Structured Query Uses various query languages or APIs


L Language Language) (depends on type)

Entities Data stored in related tables Data stored without strict relations, often
E Relation using foreign keys denormalized

Supports complex ACID Supports simple or limited transactions,


X Transaction transactions often BASE model

Horizontal scaling (scale-out) across


S Scalability Vertical scaling (scale-up) many servers

Eventual consistency often used to


C Consistency Strong consistency guaranteed improve availability

Data Handles structured, semi-structured, and


A Structure Structured data only unstructured data

Higher latency for very large Lower latency, optimized for large
L Latency datasets distributed data

Banking, ERP, where data Social media, big data analytics, real-time
E Example Use integrity is critical apps

Single point of failure, backup Designed for fault tolerance and no single
R Reliability needed point of failure

You might also like