Lecture 22

The document outlines a lecture on Exploratory Data Analysis (EDA) and its significance in data analytics and visualization. It covers the definition of EDA, its importance, steps involved, and methods for detecting and handling outliers. Key techniques for visualization and use cases are also discussed, emphasizing the role of EDA in preparing data for machine learning models.

Uploaded by

sojicex430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views20 pages

Lecture 22

Uploaded by

sojicex430

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Lecture 22

Data Analytics
and
Visualization
Course Code: CS2205

Dr. Rahul Mishra

IIT Patna
Agenda

1. What is Exploratory Data Analysis?

2. Why EDA is important?
3. Visualization
- Important charts for visualization.
4. Steps involved in EDA:
- Data Sourcing
- Data Cleaning
- Univariate analysis with visualization
- Bivariate analysis with visualization
- Derived Metrics
5. Use Cases
2
3
What is Exploratory Data Analysis

• Exploratory Data Analysis is an approach to analyze the datasets to summarize their main
characteristics in form of visual methods.
• EDA is nothing but a data exploration technique to understand various aspects of the data.
• The main aim of EDA is to obtain confidence in a data to an extent where we are ready to
engage a machine learning model.
• EDA is important to analyze the data; it’s a first step in the data analysis process.

4
5
6
7
8
9
10
11
12
13
14
15
https://github.com/pik1989/EDA/blob/main/Feature_Scaling.ipynb

16
17
Introduction

• Outliers are extreme values in a dataset that deviate significantly from the norm. They do
not fit within the normal behavior of data and can impact statistical analysis and machine
learning models.
Detecting Outliers

1. Boxplot – Identifies outliers as

points beyond whiskers.

2. Histogram – Visualizes extreme

values in frequency distribution.

3. Scatter Plot – Outliers appear as

distant points.

4. Z-score – Values beyond ±3

standard deviations indicate outliers.

5. Interquartile Range (IQR) – Values

beyond 1.5 times IQR are outliers.
Handling Outliers

1. Remove the outliers if they result from data errors or significantly skew analysis.
2. Replace outliers with:
- Quantile Method: Replace outliers with percentile values.
- Interquartile Range: Adjust extreme values.
3. Use ML models less sensitive to outliers:
- K-Nearest Neighbors (KNN)
- Decision Trees
- Support Vector Machines (SVM)
- Naïve Bayes
- Ensemble Methods

Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
EDA - Task
No ratings yet
EDA - Task
20 pages
DS203 2024 09 06 Data Problems 1
No ratings yet
DS203 2024 09 06 Data Problems 1
25 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
23 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
What Is Exploratory Data Analysis
No ratings yet
What Is Exploratory Data Analysis
28 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
EDA Guide for Data Analysts
No ratings yet
EDA Guide for Data Analysts
35 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Part 7
No ratings yet
Part 7
26 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
33 pages
Module 3
No ratings yet
Module 3
108 pages
Group 7
No ratings yet
Group 7
19 pages
Lecture 21
No ratings yet
Lecture 21
16 pages
Unit 1
No ratings yet
Unit 1
23 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
No ratings yet
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
17 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
EDA and Cleaning
No ratings yet
EDA and Cleaning
24 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Unit 4
No ratings yet
Unit 4
33 pages
Dsi237 Group 2
No ratings yet
Dsi237 Group 2
27 pages
Unit 1 DXV
No ratings yet
Unit 1 DXV
28 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Eda 1
No ratings yet
Eda 1
25 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
P23MBA547 Predictive Analytics
No ratings yet
P23MBA547 Predictive Analytics
133 pages
EDA: Essential for Data Scientists
No ratings yet
EDA: Essential for Data Scientists
7 pages
03a EDA
No ratings yet
03a EDA
47 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
1 page
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
Notes Unit I
No ratings yet
Notes Unit I
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
12 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
Dev Core
No ratings yet
Dev Core
7 pages
Module 2
No ratings yet
Module 2
78 pages
Guide Eda Python 2
No ratings yet
Guide Eda Python 2
30 pages
Exploratory Data Analysis Presentation
No ratings yet
Exploratory Data Analysis Presentation
16 pages
Data Exploration & Visualization Guide
No ratings yet
Data Exploration & Visualization Guide
42 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Exploratory Data Analysis: Datascience Using Python Topic: 3
No ratings yet
Exploratory Data Analysis: Datascience Using Python Topic: 3
32 pages
UNIT 1 Exploratory Data Analysis
100% (3)
UNIT 1 Exploratory Data Analysis
21 pages
Eda 2
No ratings yet
Eda 2
69 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Unit 1
No ratings yet
Unit 1
19 pages

Lecture 22

Uploaded by

Lecture 22

Uploaded by

Lecture 22

Dr. Rahul Mishra

1. What is Exploratory Data Analysis?

1. Boxplot – Identifies outliers as

2. Histogram – Visualizes extreme

3. Scatter Plot – Outliers appear as

4. Z-score – Values beyond ±3

5. Interquartile Range (IQR) – Values

You might also like