0% found this document useful (0 votes)

17 views9 pages

Advanced Databases Unit 2

The document provides an overview of advanced databases, including modern databases, NoSQL, NewSQL, and RDBMS, explaining their structures, use cases, and differences. It also covers various database management tools, ETL processes, and the distinctions between OLTP and OLAP systems. Additionally, it discusses data preparation and cleaning techniques essential for ensuring data accuracy and usability.

Uploaded by

18-TYCM-I-Gaurav Gangurde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views9 pages

Advanced Databases Unit 2

Uploaded by

18-TYCM-I-Gaurav Gangurde

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

ADVANCED DATABASES

Unit 2 ( Module 1 )

 Introduction To Moder Databases

 What is a Database?
A database is like a digital storage room where data is kept. Imagine a huge,
organized file cabinet. It stores all kinds of information like customer details, product
information, or transaction records.
 Modern Databases:
Modern databases are more advanced and powerful than older ones. They are
designed to store, manage, and quickly find large amounts of data, even as that data
grows rapidly. They use advanced technologies to make sure the data is organized,
easy to access, and secure.
Modern databases are powerful tools that store, manage, and retrieve data
efficiently. They are built to handle lots of data and make it easy to access, secure,
and manage. They are essential for everything from small apps to massive
companies with millions of users.

Example of Where Modern Databases Are Used:

 E-commerce websites use databases to store product information, customer
orders, and payments.
 Social media platforms use databases to store user profiles, posts, and comments.
 Banks use databases to track account information and transactions.

 NoSQL, NewSQL
1. NoSQL:
NoSQL stands for "Not Only SQL". It’s a type of database that is designed for storing
and managing large amounts of data that may not fit well into traditional relational
databases.
They can handle huge amounts of data across many servers.
NoSQL can store data in different formats, like key-value pairs, documents, wide-
columns, or graphs

 Types of NoSQL Databases:

o Document-Based: Stores data in documents (e.g., JSON or BSON format).
Example: MongoDB.
o Key-Value Stores: Stores data as key-value pairs. Example: Redis.
o Column-Based: Data is stored in columns rather than rows. Example:
Cassandra.
o Graph-Based: Designed for relationships between data (e.g., social networks).
Example: Neo4j.
 When to Use: NoSQL is ideal for projects that need to handle:
o Large amounts of unstructured or semi-structured data.
o Quick scalability and flexibility.
o Real-time data, like social media or IoT (Internet of Things) data.

2. NewSQL:
NewSQL is a newer category of databases that aim to provide the advantages of SQL
(structured data and relational models) with the scalability and performance
features that NoSQL databases offer.
It is designed to scale horizontally, which means it can handle increased traffic and
large amounts of data more easily (just like NoSQL).
support transactional processing (like banking systems).

 What it is: NewSQL is built to combine the best of both worlds: it supports traditional
SQL (structured queries, transactions) but can handle large-scale data and
distributed architectures like NoSQL.
 Popular NewSQL Databases:
o Google Spanner: A distributed relational database that can scale horizontally
while maintaining consistency and strong consistency guarantees.
o CockroachDB: A distributed SQL database that is easy to scale while
maintaining SQL features.
o VoltDB: A high-performance NewSQL database designed for fast transactions.
 When to Use: NewSQL is useful when you need:
o Relational data but also need to scale to handle high traffic.
o Strong consistency and ACID transactions at a large scale.
o High availability with minimal downtime.

 RDBMS Databases
RDBMS (Relational Database Management System):
An RDBMS is a type of database that stores data in an organized way, using tables
that are related to each other. It's like a digital spreadsheet where the data is
structured into rows and columns.
Example:

StudentID First_Name Last_Name Age Major

Computer
1 John Doe 20
Science
2 Jane Smith 22 Mathematics

This is a simple example of an RDBMS table where:

 The columns represent attributes (like name, age, major).

 Each row represents a single student.

 Examples: MySQL, PostgreSQL, Oracle, SQL Server.

 NoSQL Vs RDBMS Databases

Feature NoSQL RDBMS (SQL)

Flexible (documents, key-value, graphs, Structured (tables with rows and

Data Model
etc.) columns)

No fixed schema (can change over

Schema Fixed schema (predefined structure)
time)

Vertical scaling (requires stronger

Scaling Horizontal scaling (across many servers)
hardware)

Not always ACID-compliant (eventual ACID-compliant (strong consistency

Transactions
consistency) and reliability)

High performance, especially for large Optimized for complex queries and
Performance
datasets transactions

Big data, real-time apps, flexible data Financial systems, CRMs, inventory
Use Cases
(social media, IoT) systems, reporting

MySQL, PostgreSQL, Oracle, SQL

Examples MongoDB, Cassandra, Redis, Neo4j
Server

Unit 2 ( Module 1 )
 Tools
1. Database Management Systems (DBMS):

These are the core tools used to create, manage, and interact with databases. They allow
users to store, retrieve, and manipulate data.

 Examples:

o MySQL: A popular open-source relational database system.

o PostgreSQL: Another open-source database system known for its advanced

features.

o MongoDB: A NoSQL database used for flexible data storage (documents, key-
value pairs, etc.).

2. ETL Tools (Extract, Transform, Load):

ETL tools are used to move and manipulate data from different sources and load it into a
data warehouse or database.

 Extract: Getting data from various sources.

 Transform: Cleaning or converting the data into a suitable format.

 Load: Putting the data into the final destination (like a data warehouse).

 Examples:

o Informatica: A powerful tool used for data integration.

o Talend: An open-source ETL tool that helps in connecting and transforming

data.

o Apache Nifi: A tool for automating the flow of data between systems.

3. Data Warehousing Tools:

These are used to store and manage large amounts of historical data that come from
various sources, making it easier for businesses to run reports and analyze trends.

 Examples:

o Amazon Redshift: A cloud-based data warehouse that can handle large

datasets.

o Google BigQuery: A tool for running fast, SQL-like queries on massive

amounts of data in the cloud.

4. Database Performance Tuning Tools:

These tools help optimize and monitor how well a database is running. They make sure the
database is fast, efficient, and can handle a lot of queries.

 Examples:

o Oracle Enterprise Manager: Helps monitor and manage Oracle databases.

o SQL Profiler (for SQL Server): Monitors and analyzes SQL queries to identify
slow parts of the database.

o pgAdmin: A tool for managing PostgreSQL databases and optimizing their

performance.

5. Backup and Recovery Tools:

These tools ensure that your data is safe and can be restored if something goes wrong, like a
system failure or human error.

 Examples:

o Veeam: A backup and recovery tool for both databases and virtual
environments.

o RMAN (Recovery Manager): A tool for backing up and recovering Oracle

databases.

6. Data Migration Tools:

These tools help you move data from one system or format to another, such as moving data
between different databases or to the cloud.

 Examples:

o AWS Database Migration Service: Helps you move databases to the cloud
with minimal downtime.

o Microsoft Data Migration Assistant: Used to migrate databases to SQL Server.

7. NoSQL Database Tools:

These tools help manage and interact with NoSQL databases that store data in ways other
than traditional tables (e.g., key-value pairs, documents, or graphs).

 Examples:

o MongoDB Compass: A GUI tool for MongoDB that helps visualize and analyze
data.

o Cassandra Query Language (CQL): A tool used to interact with Apache

Cassandra (a NoSQL database).

8. Database Security Tools:

These tools ensure that the data is protected and only authorized users can access or modify
it.

 Examples:

o IBM Guardium: Monitors and protects sensitive data in databases.

o Oracle Audit Vault: A tool for monitoring database security and compliance.

9. Data Visualization and Reporting Tools:

These tools help create reports and visualizations of the data stored in databases, making it
easier to analyze trends and make decisions.

 Examples:

o Tableau: A popular tool for creating visualizations and dashboards from

database data.

o Power BI: A Microsoft tool that connects to various databases and creates
interactive reports and dashboards.

 OLTP & OLAP

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types
of database systems used for different purposes

1. OLTP (Online Transaction Processing):

o It's designed for handling everyday transactions and operations.

o Example: When you make a purchase online, check your bank account
balance, or update your contact details, these are all OLTP activities.

o Focus: Speed, accuracy, and handling many small transactions at once (like
inserting, updating, or deleting records).

o Databases are usually highly normalized (organized to minimize redundancy).

Example: An e-commerce website where every time a customer buys something, the system
records the transaction, updates the inventory, and adjusts the customer's order history.

2. OLAP (Online Analytical Processing):

o It's designed for complex data analysis and reporting, often using historical
data.
o Example: Looking at business trends over the past year, running reports on
sales performance by region, or analyzing data for decision-making.

o Focus: Complex queries, aggregations, and summarizations of large datasets,

often for decision-making.

o Databases are usually denormalized (to make analysis faster by storing data in
a more readable format).

Example: A company’s manager might run an OLAP query to find out how sales have
changed over the last 5 years in different regions.

Key Differences:

 OLTP is about fast and efficient handling of transactions, while OLAP is about
analyzing large amounts of data for patterns and trends.

 OLTP databases have lots of small updates, inserts, and deletions, whereas OLAP
databases focus on large read-heavy operations, like summarizing and analyzing
data.

 Data Preparation & Cleaning Techniques

In an advanced database context, data preparation and cleaning techniques are all
about making sure the data you work with is accurate, consistent, and usable for
analysis or further processing. Here are the most common techniques,
1. Handling Missing Data
 Why?: Missing data can mess up your analysis, so it's important to deal with it.
 How?:
o Remove Missing Data: Sometimes, if the missing data is small, you can simply
remove the rows or columns that have it.
o Fill with Defaults: You can replace missing values with common replacements
like the mean, median, or the most frequent value.
o Prediction: Use algorithms to predict what the missing values should be
based on other data.

2. Removing Duplicates
 Why?: Duplicate data can distort your results, making them inaccurate.
 How?: Find and remove rows that are exactly the same to ensure that each record is
unique.
3. Standardizing Data
 Why?: Data may come from different sources with different formats (like dates in
various formats), which can cause confusion.
 How?:
o Consistent Formats: Make sure everything is in the same format (e.g., dates
should all be in YYYY-MM-DD).
o Scaling: If you're working with numbers, sometimes you need to normalize or
standardize them (scaling to a specific range or making them comparable).

4. Handling Outliers
 Why?: Outliers (data points far from the norm) can skew your analysis and make
results unreliable.
 How?: Identify and either remove outliers or transform them to be in line with other
data, depending on their significance.

5. Dealing with Categorical Data

 Why?: Many machine learning algorithms can't work with categories like "yes", "no",
"red", "blue" directly.
 How?: Convert these categories into numbers or one-hot encode them (creating
separate columns for each category).

6. Text Data Cleaning

 Why?: If you're working with text data (like customer reviews or tweets), it might
contain extra or irrelevant information.
 How?:
o Remove unwanted characters (like punctuation or special symbols).
o Lowercase everything to make it uniform.
o Remove common words (like "the", "is", "and") that don’t add much
meaning.

7. Fixing Inconsistent Data

 Why?: Sometimes data entries aren’t consistent (e.g., "USA" vs "U.S.A." or "NY" vs
"New York").
 How?: Standardize the way things are written, making sure they all follow the same
naming rules.

8. Converting Data Types

 Why?: Data may be stored incorrectly (e.g., numbers stored as text or dates stored as
plain text), making it hard to work with.
 How?: Convert data into the right type (e.g., turning a string of numbers into actual
numeric values).
9. Data Transformation
 Why?: Sometimes data needs to be changed to make it more useful for analysis.
 How?:
o Log Transformation: For very large numbers, taking the logarithm can make
the data easier to analyze.
o Feature Engineering: Create new columns from existing data, like splitting a
"date" column into "day", "month", and "year".

10. Data Consistency Checks

 Why?: You need to make sure your data is valid and follows the rules you expect
(e.g., no negative values for ages or prices).
 How?: Verify that the data follows proper rules and fix any errors (like changing a
negative price to a valid value).

11. Data Aggregation

 Why?: Sometimes, you need to combine data into a simpler form to make it more
useful for analysis.
 How?: You might combine data from different rows or columns into a single
summary, like calculating the total sales from individual product sales.

By applying these techniques, you make sure that the data in your advanced
database is clean, consistent, and ready for more complex analysis, like generating
reports, building models, or making predictions.

1 I Wonder 5 Activity Book
100% (2)
1 I Wonder 5 Activity Book
29 pages
DBMS Overview for Students
No ratings yet
DBMS Overview for Students
6 pages
Emerging Trends in Database
No ratings yet
Emerging Trends in Database
4 pages
Concerto de Brandemburgo Nº2 (J. S. Bach)
100% (1)
Concerto de Brandemburgo Nº2 (J. S. Bach)
5 pages
Management Information System Assignment
0% (1)
Management Information System Assignment
6 pages
Chapter1-Overview of Database Concepts
No ratings yet
Chapter1-Overview of Database Concepts
19 pages
Types o Database
No ratings yet
Types o Database
11 pages
Cat-Themed Musical Score
No ratings yet
Cat-Themed Musical Score
9 pages
CloudComputing DATABASE
No ratings yet
CloudComputing DATABASE
27 pages
Introduction To Databases Part 1
No ratings yet
Introduction To Databases Part 1
78 pages
Advanced Database Totorials 1
No ratings yet
Advanced Database Totorials 1
95 pages
Database Types
No ratings yet
Database Types
4 pages
SQL Unit1
No ratings yet
SQL Unit1
28 pages
Overview of Data Repositories
No ratings yet
Overview of Data Repositories
48 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
Course Work Database Programming
No ratings yet
Course Work Database Programming
18 pages
Module 02 Databases Accessible PowerPoint Presentation
No ratings yet
Module 02 Databases Accessible PowerPoint Presentation
51 pages
S-Advance Database Management System 1
No ratings yet
S-Advance Database Management System 1
68 pages
They Come in Various Types
No ratings yet
They Come in Various Types
3 pages
ACS233025 M Talha
No ratings yet
ACS233025 M Talha
4 pages
Adm All Units Material
No ratings yet
Adm All Units Material
78 pages
Database Languages and Big Data Applications
No ratings yet
Database Languages and Big Data Applications
12 pages
Saf2 Itc
No ratings yet
Saf2 Itc
28 pages
DBMS Unit Iii
No ratings yet
DBMS Unit Iii
13 pages
DATABASE
No ratings yet
DATABASE
3 pages
DBMS PPT 1 Eng
No ratings yet
DBMS PPT 1 Eng
74 pages
Ijeme V13 N4 5
No ratings yet
Ijeme V13 N4 5
9 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Database
No ratings yet
Database
4 pages
WK 3
No ratings yet
WK 3
29 pages
1 Introduction
No ratings yet
1 Introduction
39 pages
Udbms Notes
No ratings yet
Udbms Notes
18 pages
New 2nd Lecture Data Resource Management
No ratings yet
New 2nd Lecture Data Resource Management
24 pages
Hand Out Intro To Database
No ratings yet
Hand Out Intro To Database
112 pages
Technical Presentation - MySQL
No ratings yet
Technical Presentation - MySQL
17 pages
Fundamentals of Databases & DBMS
No ratings yet
Fundamentals of Databases & DBMS
4 pages
Introduction To Database Systems
No ratings yet
Introduction To Database Systems
4 pages
DBMS-Module 1
No ratings yet
DBMS-Module 1
40 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
SWDF Assignment Database
No ratings yet
SWDF Assignment Database
12 pages
Unit 6
No ratings yet
Unit 6
143 pages
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
No ratings yet
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
13 pages
DB
No ratings yet
DB
3 pages
Data Analytics - My Notes
No ratings yet
Data Analytics - My Notes
40 pages
BD Unit 1
No ratings yet
BD Unit 1
5 pages
Database
No ratings yet
Database
4 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
DATABASE II, Note
No ratings yet
DATABASE II, Note
16 pages
App Dev Finals
No ratings yet
App Dev Finals
7 pages
DBMS Lecture 1
No ratings yet
DBMS Lecture 1
6 pages
Capr-I 4115
No ratings yet
Capr-I 4115
84 pages
GA4 User-Provided Data
No ratings yet
GA4 User-Provided Data
41 pages
Types of Databases Explained
No ratings yet
Types of Databases Explained
5 pages
Database Management System Major Assignment
No ratings yet
Database Management System Major Assignment
17 pages
Case Study About Database Tools
No ratings yet
Case Study About Database Tools
13 pages
Wa0033.
No ratings yet
Wa0033.
26 pages
English Language
No ratings yet
English Language
12 pages
JOHN KEATS AND THE CULTURE OF DISSENT 2nd Edition Nicholas Roe - The Full Ebook Set Is Available With All Chapters For Download
100% (1)
JOHN KEATS AND THE CULTURE OF DISSENT 2nd Edition Nicholas Roe - The Full Ebook Set Is Available With All Chapters For Download
86 pages
Corpus BasedSociolinguistics Partington
No ratings yet
Corpus BasedSociolinguistics Partington
7 pages
Databases - A Comprehensive Overview
No ratings yet
Databases - A Comprehensive Overview
7 pages
42 Plag Report
No ratings yet
42 Plag Report
56 pages
Determine Suitability of Database Functionality and Scalability1
No ratings yet
Determine Suitability of Database Functionality and Scalability1
8 pages
Maze Solving Robot Project
No ratings yet
Maze Solving Robot Project
42 pages
Unit I
No ratings yet
Unit I
11 pages
BDA Unit 2
No ratings yet
BDA Unit 2
30 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
42 pages
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
No ratings yet
InGuard (Toll Fraud Guard) Application Installation Manual - 2 - 0
23 pages
Low Power MAC Architecture Design
No ratings yet
Low Power MAC Architecture Design
5 pages
Instruction Manual 862 Interface-Thermo-Element
100% (1)
Instruction Manual 862 Interface-Thermo-Element
16 pages
Chapter1 NoSQL Databases
No ratings yet
Chapter1 NoSQL Databases
7 pages
ARM's Journey: From Acorn to Global Impact
No ratings yet
ARM's Journey: From Acorn to Global Impact
28 pages
Research Paper Memes - As - Digital - Folk - Tales
No ratings yet
Research Paper Memes - As - Digital - Folk - Tales
9 pages
Mechatronics Lab Manual Latest - Dummy
No ratings yet
Mechatronics Lab Manual Latest - Dummy
11 pages
Tesla's TTPoE for AI Supercomputers
No ratings yet
Tesla's TTPoE for AI Supercomputers
23 pages
Form 2 School Based Computer Science Syllabus
No ratings yet
Form 2 School Based Computer Science Syllabus
5 pages
Arduino Motor Shield 2A
No ratings yet
Arduino Motor Shield 2A
6 pages
Allusions
No ratings yet
Allusions
5 pages
Infinitives - Rule - and - Check - Answer Key
No ratings yet
Infinitives - Rule - and - Check - Answer Key
4 pages
Section 4
No ratings yet
Section 4
4 pages
Networking Commands
No ratings yet
Networking Commands
2 pages
Handout Number Week 1
No ratings yet
Handout Number Week 1
1 page
Shakuthala Summary
No ratings yet
Shakuthala Summary
2 pages
Key Elements of Drama Explained
No ratings yet
Key Elements of Drama Explained
1 page
Extra Grammar Exercises (Unit 3, Page 29) LESSON 1 The Simple Present Tense: Review
No ratings yet
Extra Grammar Exercises (Unit 3, Page 29) LESSON 1 The Simple Present Tense: Review
4 pages
Academic Article Template
No ratings yet
Academic Article Template
2 pages

Advanced Databases Unit 2

Uploaded by

Advanced Databases Unit 2

Uploaded by

ADVANCED DATABASES

 Introduction To Moder Databases

Example of Where Modern Databases Are Used:

 Types of NoSQL Databases:

StudentID First_Name Last_Name Age Major

This is a simple example of an RDBMS table where:

 The columns represent attributes (like name, age, major).

 Each row represents a single student.

 NoSQL Vs RDBMS Databases

Feature NoSQL RDBMS (SQL)

Flexible (documents, key-value, graphs, Structured (tables with rows and

No fixed schema (can change over

Vertical scaling (requires stronger

Not always ACID-compliant (eventual ACID-compliant (strong consistency

MySQL, PostgreSQL, Oracle, SQL

o MySQL: A popular open-source relational database system.

o PostgreSQL: Another open-source database system known for its advanced

2. ETL Tools (Extract, Transform, Load):

 Extract: Getting data from various sources.

 Transform: Cleaning or converting the data into a suitable format.

o Informatica: A powerful tool used for data integration.

o Talend: An open-source ETL tool that helps in connecting and transforming

3. Data Warehousing Tools:

o Amazon Redshift: A cloud-based data warehouse that can handle large

o Google BigQuery: A tool for running fast, SQL-like queries on massive

4. Database Performance Tuning Tools:

o Oracle Enterprise Manager: Helps monitor and manage Oracle databases.

o pgAdmin: A tool for managing PostgreSQL databases and optimizing their

5. Backup and Recovery Tools:

o RMAN (Recovery Manager): A tool for backing up and recovering Oracle

6. Data Migration Tools:

o Microsoft Data Migration Assistant: Used to migrate databases to SQL Server.

7. NoSQL Database Tools:

o Cassandra Query Language (CQL): A tool used to interact with Apache

8. Database Security Tools:

o IBM Guardium: Monitors and protects sensitive data in databases.

9. Data Visualization and Reporting Tools:

o Tableau: A popular tool for creating visualizations and dashboards from

 OLTP & OLAP

1. OLTP (Online Transaction Processing):

o It's designed for handling everyday transactions and operations.

o Databases are usually highly normalized (organized to minimize redundancy).

2. OLAP (Online Analytical Processing):

o Focus: Complex queries, aggregations, and summarizations of large datasets,

 Data Preparation & Cleaning Techniques

5. Dealing with Categorical Data

6. Text Data Cleaning

7. Fixing Inconsistent Data

8. Converting Data Types

10. Data Consistency Checks

11. Data Aggregation

You might also like