Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views20 pages

Lecture 22

The document outlines a lecture on Exploratory Data Analysis (EDA) and its significance in data analytics and visualization. It covers the definition of EDA, its importance, steps involved, and methods for detecting and handling outliers. Key techniques for visualization and use cases are also discussed, emphasizing the role of EDA in preparing data for machine learning models.

Uploaded by

sojicex430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Lecture 22

The document outlines a lecture on Exploratory Data Analysis (EDA) and its significance in data analytics and visualization. It covers the definition of EDA, its importance, steps involved, and methods for detecting and handling outliers. Key techniques for visualization and use cases are also discussed, emphasizing the role of EDA in preparing data for machine learning models.

Uploaded by

sojicex430
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Lecture 22

Data Analytics
and
Visualization
Course Code: CS2205

Dr. Rahul Mishra


IIT Patna
Agenda

1. What is Exploratory Data Analysis?


2. Why EDA is important?
3. Visualization
- Important charts for visualization.
4. Steps involved in EDA:
- Data Sourcing
- Data Cleaning
- Univariate analysis with visualization
- Bivariate analysis with visualization
- Derived Metrics
5. Use Cases
2
3
What is Exploratory Data Analysis

• Exploratory Data Analysis is an approach to analyze the datasets to summarize their main
characteristics in form of visual methods.
• EDA is nothing but a data exploration technique to understand various aspects of the data.
• The main aim of EDA is to obtain confidence in a data to an extent where we are ready to
engage a machine learning model.
• EDA is important to analyze the data; it’s a first step in the data analysis process.

4
5
6
7
8
9
10
11
12
13
14
15
https://github.com/pik1989/EDA/blob/main/Feature_Scaling.ipynb

16
17
Introduction

• Outliers are extreme values in a dataset that deviate significantly from the norm. They do
not fit within the normal behavior of data and can impact statistical analysis and machine
learning models.
Detecting Outliers

1. Boxplot – Identifies outliers as


points beyond whiskers.

2. Histogram – Visualizes extreme


values in frequency distribution.

3. Scatter Plot – Outliers appear as


distant points.

4. Z-score – Values beyond ±3


standard deviations indicate outliers.

5. Interquartile Range (IQR) – Values


beyond 1.5 times IQR are outliers.
Handling Outliers

1. Remove the outliers if they result from data errors or significantly skew analysis.
2. Replace outliers with:
- Quantile Method: Replace outliers with percentile values.
- Interquartile Range: Adjust extreme values.
3. Use ML models less sensitive to outliers:
- K-Nearest Neighbors (KNN)
- Decision Trees
- Support Vector Machines (SVM)
- Naïve Bayes
- Ensemble Methods

You might also like