CS-2005 DATABASE
SYSTEMS
Week-02
Course Objectives
In this course we aim at studying:
Parallel and
distributed
Connection with DBMSs, NoSQL
various DBMS
How to refine
and speed up
How to query data retrieval
and manipulate and
How to design databases manipulation
and implement
databases from
‘cradle-to-grave’
Application-Centric Systems-Centric & Theory-Centric Advanced Topics
(A Brief Overview)
2
Outline
1. Types of Data
2. Drawbacks of File System
3. Advantages of DBMS
4. ACID property
5. SQL vs NoSQL
3
Files
4
File System
DBMS
5
Drawbacks of File System
■ Data Redundancy & Data Inconsistency
■ Difficulty in Accessing Data
■ Data Isolation
■ Integrity Problems
■ Atomicity Problems
■ Concurrent Access Anomalies
■ Security Problems
6
1. Data Redundancy & Data Inconsistency
OFFICE
LIBRARY
Accounts
OFFICE
LIBRARY
Accounts
2. Difficulty in Accessing Data
OFFICE
LIBRARY
ACCOUNT
3. Data Isolation
4. Integrity Problems
5. Atomicity Problems
Account A Account B
Initial Balance: 1000/- Initial Balance: 2000/-
Fund Transfer of 800/- from Account A to Account B
6. Concurrent Access Anomalies
7. Security
PURPOSE OF DBMS
DATABASE SYSTEMS OFFER SOLUTIONS TO
ALL THE ABOVE PROBLEMS
Advantages of DBMS over file
system
1. Fast Data Access: The data response time increases in DBMS.
2. Minimized Data Redundancy: DBMS has different constraints
using them same data can't be stored in more than one places.
3. Data Consistency: Since DBMS solves the problem of data
redundancy, the problem of data consistency is automatically
solved.
4. No attributes for accessing the data: Here, we don't need to
know the location of the file. The user makes a request from
any web application or app and the server responds
accordingly.
17
4. Concurrent Access: Multiple users can access the database
at the same time when we are using the Database
Management System.
5. Security: We have role-based access control in DBMS. Each
user has a different set of access thus the data is secured
from problems like data leaks, misuse of data etc.
Examples RDBMS
19
ACID Property
■ Atomicity - a transaction to transfer funds from
one account to another involves making a
withdrawal operation from the first account
and a deposit operation on the second. If the
deposit operation failed, you don’t want the
withdrawal operation to happen either.
20
Cont..
■ Consistency: This property ensures that the
transaction maintains data integrity
constraints, leaving the data consistent. The
transaction creates a new valid state of the
data and if some failure happens, return all the
data with the state before the transaction
being executed.
21
Cont..
■ Isolation: This property ensures the isolation of
each transaction, ensuring that the
transaction will not be changed by any other
concurrent transaction. It means that each
transaction in progress will not be interfered
by any other transaction until it is completed.
22
Cont..
■ Durability: Once a transaction is completed and
committed, its changes are persisted
permanently in the database. This property
ensures that the information that is saved in
the database is immutable until another update
or deletion transaction affects it.
23
Quiz
Find at least 2 design errors in the given database
24
Types of Data
Types
Structured Unstructured
Data Data
25
Structured Data
■ Stored in tabular format
■ Clearly defined
■ Data is stored in a pre-defined data model
■ Think of data that fits neatly within fixed fields and
columns in relational databases and spreadsheets.
■ Examples of structured data include
– names, dates, addresses,
– credit card numbers,
– stock information,
– geolocation, and more.
26
Structured Data
RDBMS
Structured data is stored in relational databases
27
Unstructured Data
■ No predefined structure
■ No data model
■ Irregular and ambiguous
■ Examples of unstructured data include
– text,
– video files,
– audio files,
– mobile activity,
– social media posts,
– satellite imagery
■ Non-relational or NoSQL databases are the best fit for
managing unstructured data.
28
Types of Databases
29
Horizontal Vs. Vertical
Scaling Horizontal scaling Vertical scaling
Increase or decrease the number of nodes in Increase or decrease the power of a
Description a cluster or system to handle an increase or system to handle increased or reduced
decrease in workload workload
Add or reduce the number of virtual Add or reduce the CPU or memory
Example
machines (VM) in a cluster of VMs capacity of the existing VM
Execution Scale in/out Scale up/down
Workload is distributed across multiple
nodes. A single node handles the entire
Workload distribution
Parts of the workload reside on these workload.
different nodes
Distributes multiple jobs across multiple Relies on multi-threading on the
Concurrency machines over the network, at a go. This existing machine to handle multiple
reduces the workload on each machine requests at the same time
Required architecture Distributed Any
30
Horizontal Vs. Vertical
Scaling
Implementation Takes more time, expertise, and effort
Takes less time, expertise, and
effort
Complexity and maintenance Higher Lower
This requires modifying a sequential piece No need to change the logic. The
Configuration of logic in order to run workloads same code can run on a higher-
concurrently on multiple machines spec device
Necessary to actively distribute workload
Load balancing Not required in the single node
across the multiple nodes
Low because other machines in the cluster High since it’s a single source of
Failure
offer backup failure
Low-cost initially; less cost-
Costs High costs initially; optimal over time
effective over time
Slower machine-to-machine
Networking Quick inter-machine communication
communication
Performance Higher Lower
Limited to the resource capacity
Limitation Add as many machines as you can
the single machine can handle
31
Architectures
32
SQL
■ It has a predefined schema.
■ Add Nil if data is not present (Memory Wastage)
■ Change Schema or Data in case of modifications
■ Tabular format
■ Not easily scalable (designed for 90’s technology or
worse)
■ Requires joins
33
NoSQL
■ Schema-less Database
■ Change can be easily incorporated
Key/value (Dynamo)
Columnar/tabular
(HBase)
Document (mongoDB)
34
SQL vs NoSQL
35
Is NoSQL better than SQL?
■ NoSQL tends to be a better option for modern
applications that have more complex, constantly
changing data sets, requiring a flexible data model
that doesn’t need to be immediately defined.
■ NoSQL databases can store and process data in real-
time.
■ NoSQL databases can't typically enforce or
guarantee uniqueness for keys within documents like
traditional relational systems do.
36
When to Choose SQL
Structured Data: If your data has a well-defined schema with fixed
tables and relationships between them, SQL databases are a
good choice.
ACID Compliance: SQL databases are ACID (Atomicity,
Consistency, Isolation, Durability) compliant, which ensures data
consistency and reliability. If your application requires strict
transaction management, SQL databases are a strong choice.
Complex Queries: SQL databases excel at handling complex
queries, especially those involving multiple joins and
aggregations.
Scalability: SQL databases can be scaled vertically (by adding
more resources to a single server) or, in some cases, horizontally.
While horizontal scalability can be more challenging with SQL
databases, it's still possible with certain configurations.
37
When to Choose NoSQL
Flexible Schema: NoSQL databases are schema-less or have a
flexible schema, making them suitable for projects with
evolving Ultimately, the
or unstructured choice between
data.
SQL and NoSQL databases
depends
High Throughput: NoSQL on databases
the dataare structure,
often chosen for high-
velocity scale,
applications with large volumes of data and high
performance
read/write rates, such as social media platforms, IoT, and
requirements,
real-time analytics.
and the
development team's familiarity
Horizontal with theNoSQL
Scalability: technology.
databasesInaresomedesigned for easy
horizontal cases,
scaling. a They
hybrid
canapproach using
distribute data across multiple
servers or bothnodes,
types providing excellent within
of databases scalability and fault
tolerance.
the same project may be the
Variety of most appropriate
Data Models: solutioncome
NoSQL databases to in different
take
flavors, including advantage
document-orientedof (e.g.,
their
MongoDB), key-
value stores (e.g., strengths.
respective Redis), column-family stores (e.g.,
Cassandra), and graph databases (e.g., Neo4j). You can
choose the NoSQL type that best matches your data and
query requirements. 38
QUESTIONS