0% found this document useful (0 votes)

24 views5 pages

Perform Exploratory Data Analysis

what is perform exploratory data analysis?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views5 pages

Perform Exploratory Data Analysis

what is perform exploratory data analysis?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

How to perform exploratory data analysis

Imagine you have a dataset of students' test scores and demographics.

Here's a simplified step-by-step approach:

1. Load Data:
o Read the data into a table.
o Example: students = pd.read_csv('students.csv')
2. Initial Exploration:
o Look at the first few rows: students.head()
o Check data structure: students.info()
3. Summary Statistics:
o Calculate mean and median of test scores.
o Count unique values in gender column.
4. Handle Missing Data:
o Identify missing entries: students.isnull().sum()
o Fill missing scores with the mean or remove those rows.
5. Visualize Data:
o Histogram of test scores.
o Box plot of test scores by gender.
o Scatter plot of test scores versus study hours.
6. Find Patterns:
o Calculate correlation between study hours and test scores.
o Cross-tabulate test scores and extracurricular participation.
7. Identify Outliers:
o Use IQR to find unusually high or low test scores.
o Use Z-score to find test scores that are far from the average.
8. Feature Engineering:
o Create a new feature combining study hours and class
participation.
Performing Exploratory Data Analysis (EDA) involves several steps, from understanding the structure of
the data to summarizing its main characteristics. Below is a detailed guide on how to perform EDA using
Python with libraries like Pandas, Matplotlib, and Seaborn.

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

# Load data

df = pd.read_csv('your_dataset.csv')

# Understand data structure

print(df.head())

print(df.shape)

print(df.info())

print(df.describe())

# Data cleaning

df.dropna(inplace=True)

df.drop_duplicates(inplace=True)
# Univariate analysis

df['column_name'].hist(bins=30)

plt.show()

sns.boxplot(x=df['column_name'])

plt.show()

# Bivariate analysis

plt.scatter(df['column_x'], df['column_y'])

plt.xlabel('column_x')

plt.ylabel('column_y')

plt.show()

correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True)

plt.show()

# Categorical data analysis

df['categorical_column'].value_counts().plot(kind='bar')
plt.show()

# Identifying outliers using IQR

Q1 = df['column_name'].quantile(0.25)

Q3 = df['column_name'].quantile(0.75)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['column_name'] < lower_bound) | (df['column_name']

> upper_bound)]

print(outliers)

# Identifying outliers using Z-Score

df['z_score'] = stats.zscore(df['column_name'])

outliers = df[np.abs(df['z_score']) > 3]

print(outliers)

# Feature engineering

df['new_feature'] = df['feature1'] + df['feature2']

# Visualizing relationships

sns.pairplot(df)

plt.show()

# Hypothesis testing

group1 = df[df['group_column'] == 'group1']['numeric_column']

group2 = df[df['group_column'] == 'group2']['numeric_column']

t_stat, p_value = ttest_ind(group1, group2)

print(f'T-statistic: {t_stat}, P-value: {p_value}')

This workflow provides a structured approach to performing EDA, helping you understand the dataset's
characteristics and relationships before moving on to more complex analysis or modeling.

EDA On Titanic Dataset
100% (1)
EDA On Titanic Dataset
39 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
05 AIHC Exp02
No ratings yet
05 AIHC Exp02
11 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
No ratings yet
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
17 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Eda Expt
No ratings yet
Eda Expt
6 pages
Python EDA Guide for Data Analysts
No ratings yet
Python EDA Guide for Data Analysts
13 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Activity EDA
No ratings yet
Activity EDA
4 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Dev Core
No ratings yet
Dev Core
7 pages
Lecture 22
No ratings yet
Lecture 22
20 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Exp 12
No ratings yet
Exp 12
4 pages
Unit 1 DXV
No ratings yet
Unit 1 DXV
28 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
11 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
4 pages
Guide Eda Python 2
No ratings yet
Guide Eda Python 2
30 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Exp 12
No ratings yet
Exp 12
7 pages
Explorato Ry: Data Analysis
No ratings yet
Explorato Ry: Data Analysis
6 pages
Document
No ratings yet
Document
21 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Part 7
No ratings yet
Part 7
26 pages
Machine
No ratings yet
Machine
10 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
P23MBA547 Predictive Analytics
No ratings yet
P23MBA547 Predictive Analytics
133 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Unit 1
No ratings yet
Unit 1
23 pages
EDA Cheat Sheet - Supercharge Your Data Analysis!
No ratings yet
EDA Cheat Sheet - Supercharge Your Data Analysis!
2 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
1 page
Dev 1
No ratings yet
Dev 1
2 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
TTC Catalog - EN 2013
No ratings yet
TTC Catalog - EN 2013
148 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
EDA DeepDive Guide
No ratings yet
EDA DeepDive Guide
3 pages
EDA Step by Step
No ratings yet
EDA Step by Step
2 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Unit 3
No ratings yet
Unit 3
47 pages
CANalyzer InstallationQuickStartGuide
No ratings yet
CANalyzer InstallationQuickStartGuide
76 pages
Unit1 and Unit2
No ratings yet
Unit1 and Unit2
85 pages
DS423 HIG Enu
No ratings yet
DS423 HIG Enu
37 pages
Customer Journey Map
100% (1)
Customer Journey Map
20 pages
TDX Agentforce Hackathon Rules
No ratings yet
TDX Agentforce Hackathon Rules
11 pages
EpicWeb Customer Portal User Guide
No ratings yet
EpicWeb Customer Portal User Guide
11 pages
SKEE BALL Classic: Installation and Operation Single Ball Release
No ratings yet
SKEE BALL Classic: Installation and Operation Single Ball Release
30 pages
Name: P Surya Narayana Subject: Summer Internship Section: K18Uw REG NO: 11802507 Course Code: Cse443 Topic: Dsa Self Paced
No ratings yet
Name: P Surya Narayana Subject: Summer Internship Section: K18Uw REG NO: 11802507 Course Code: Cse443 Topic: Dsa Self Paced
33 pages
Resume For Internship With No Work Experience
100% (1)
Resume For Internship With No Work Experience
6 pages
Information Bulletin - PHD M.Tech (R) M.Tech (S) - 2023
No ratings yet
Information Bulletin - PHD M.Tech (R) M.Tech (S) - 2023
18 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
CSC2071 - Lecture 08 (Classes)
No ratings yet
CSC2071 - Lecture 08 (Classes)
29 pages
What Is Client Server
No ratings yet
What Is Client Server
5 pages
1KHW002589 - E Firmware Download For ETL600R4
No ratings yet
1KHW002589 - E Firmware Download For ETL600R4
7 pages
Overview On DBS
No ratings yet
Overview On DBS
30 pages
Chapter 3 Part 1
No ratings yet
Chapter 3 Part 1
10 pages
HTSO by Tosif Ghazi
No ratings yet
HTSO by Tosif Ghazi
11 pages
Writing With ChatGPT - Lingard 2023
No ratings yet
Writing With ChatGPT - Lingard 2023
10 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
No ratings yet
Going Beyond T-SNE: Exposing Whatlies in Text Embeddings
8 pages
ICT&IT
No ratings yet
ICT&IT
4 pages
ICT Future
No ratings yet
ICT Future
4 pages
Introduction To Entrepreneurship
No ratings yet
Introduction To Entrepreneurship
4 pages
Networks
No ratings yet
Networks
4 pages
Use of ICT
No ratings yet
Use of ICT
3 pages
Cyber Security Interview Question
No ratings yet
Cyber Security Interview Question
4 pages
Chapter 12 Quizzes
No ratings yet
Chapter 12 Quizzes
3 pages
FAQs On OTS Registration Process
No ratings yet
FAQs On OTS Registration Process
3 pages
Business Stats Analysis Report
No ratings yet
Business Stats Analysis Report
3 pages
Character Reference
No ratings yet
Character Reference
2 pages
P04 Calc AbsolutReferences
No ratings yet
P04 Calc AbsolutReferences
2 pages
Venkata Rami Reddy Resume
No ratings yet
Venkata Rami Reddy Resume
1 page
FastReport .NET for ASP.NET Devs
No ratings yet
FastReport .NET for ASP.NET Devs
1 page
CEH Exam Blueprint v5
No ratings yet
CEH Exam Blueprint v5
5 pages
2019 VSC Company Profile
No ratings yet
2019 VSC Company Profile
30 pages