EDA Step by Step

The document outlines a step-by-step guide for performing Exploratory Data Analysis (EDA) in Python using libraries like pandas, numpy, matplotlib, and seaborn. It covers data loading, structure understanding, missing values, univariate and bivariate analysis, correlation, and outlier detection. Additionally, it includes commonly used pandas syntax for data manipulation and analysis.

Uploaded by

Pranay Tandel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views2 pages

EDA Step by Step

Uploaded by

Pranay Tandel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Step-by-Step EDA in Python

1. Import Required Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

2. Load the Data

df = pd.read_csv("data.csv") # Load CSV into a DataFrame

3. Understand the Structure

df.shape # (rows, columns)
df.columns # list of column names
df.info() # summary: datatypes, missing values
df.head() # first 5 rows
df.tail() # last 5 rows

4. Data Types & Summary Stats

df.dtypes # check data types of each column
df.describe() # statistical summary (mean, min, max, quartiles) for numeric columns
df['col'].value_counts() # frequency of unique values in a column
df['col'].unique() # unique values

5. Missing Values & Duplicates

df.isnull().sum() # count missing values per column
df.duplicated().sum() # count duplicate rows

6. Univariate Analysis
# Numerical Column
sns.histplot(df['age'], bins=20, kde=True)
plt.show()

# Categorical Column
sns.countplot(x='gender', data=df)
plt.show()
7. Bivariate Analysis
# Numerical vs Numerical
sns.scatterplot(x='age', y='salary', data=df)
plt.show()

# Numerical vs Categorical
sns.boxplot(x='gender', y='salary', data=df)
plt.show()

# Categorical vs Categorical
pd.crosstab(df['gender'], df['purchased'])

8. Correlation
df.corr(numeric_only=True) # correlation matrix
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.show()

9. Outlier Detection
sns.boxplot(x=df['salary'])
plt.show()

Most Commonly Used Pandas Syntax

df.head() -> Show first 5 rows
df.tail() -> Show last 5 rows
df.shape -> Get (rows, cols)
df.info() -> Data types + null counts
df.describe() -> Stats summary
df['col'].value_counts() -> Frequency of values
df.isnull().sum() -> Missing values
df.dropna() -> Remove missing rows
df.fillna(value) -> Fill missing with given value
df.drop_duplicates() -> Remove duplicates
df.sort_values(by='col') -> Sort by column
df.groupby('col').mean() -> Group and aggregate

Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Unit 6
No ratings yet
Unit 6
3 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Programming For AI: Exploratory Data Analysis
No ratings yet
Programming For AI: Exploratory Data Analysis
52 pages
Data Engineer Interview 1740985064
No ratings yet
Data Engineer Interview 1740985064
14 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Unit 2
No ratings yet
Unit 2
36 pages
Univariate Analysis in Machine Learning
No ratings yet
Univariate Analysis in Machine Learning
17 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
EDA Cheat Sheet - Supercharge Your Data Analysis!
No ratings yet
EDA Cheat Sheet - Supercharge Your Data Analysis!
2 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Perform Exploratory Data Analysis
No ratings yet
Perform Exploratory Data Analysis
5 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Data Analysis
No ratings yet
Data Analysis
42 pages
Document
No ratings yet
Document
21 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
Exp 12
No ratings yet
Exp 12
4 pages
Presentation - University
No ratings yet
Presentation - University
52 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Introduction To Pandas - Loading and Exploring Data
No ratings yet
Introduction To Pandas - Loading and Exploring Data
4 pages
EDA Code Syntax Cheatsheet
No ratings yet
EDA Code Syntax Cheatsheet
29 pages
Exploratory Data Analysis: Table of Content
No ratings yet
Exploratory Data Analysis: Table of Content
11 pages
BasicAnalysis Using PYTHON
No ratings yet
BasicAnalysis Using PYTHON
6 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
Da Pra Week-8 (Karthik S) - 074713
No ratings yet
Da Pra Week-8 (Karthik S) - 074713
9 pages
Python EDA Guide for Data Analysts
No ratings yet
Python EDA Guide for Data Analysts
13 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Machine Learning
No ratings yet
Machine Learning
149 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
Data Analysis & Visualization Guide
No ratings yet
Data Analysis & Visualization Guide
9 pages

EDA Step by Step

Uploaded by

EDA Step by Step

Uploaded by

Step-by-Step EDA in Python

1. Import Required Libraries

2. Load the Data

3. Understand the Structure

4. Data Types & Summary Stats

5. Missing Values & Duplicates

Most Commonly Used Pandas Syntax

You might also like