0% found this document useful (0 votes)

3 views7 pages

Book Draft 42

Uploaded by

chunk2learning

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Book Draft 42

Uploaded by

chunk2learning

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Chapter: 04

EXPLORATORY DATA ANALYSIS USING PYTHON

Mohd Hyder Gouri*

Faculty, Glocal School of Science and Technology,
Glocal University, Saharanpur, U.P.
*Correspondence to: [email protected]

Mohd Nafees
Faculty, Glocal School of Science and Technology,
Glocal University, Saharanpur, U.P.

DOI: https://doi.org/10.52458/9789388996747.nsp2023.eb.ch-04

Ch.Id:-GU/NSP/EB/EFMLDSP/2023/Ch-04

ABSTRACT
Data scientists and analysts can analyze, display, and get important insights from their
datasets through exploratory data analysis (EDA), a critical phase in the data analysis process.
With its extensive data manipulation and visualization module ecosystem, Python has become a
potent tool in this situation. This chapter provides a thorough introduction of Python-based EDA
techniques, highlighting the value of EDA in the pipeline for data analysis and presenting
different approaches to data visualization, summary statistics, and statistical testing.

Keywords: Exploratory Data Analysis, EDA, Python, Data Analysis, Data Visualization,
Summary Statistics, Data Exploration, Data Insights, Data Science

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 28

Engineering the Future: Machine Learning and Data Science in Practice

4.1 INTRODUCTION
The growing accessibility of huge and complicated datasets has caused a
paradigm change in the field of data analysis. Understanding your data is essential in
the big data age we live in today. Data analysis begins with exploratory data analysis
(EDA), which acts as a compass for navigating the data world. It entails methodically
going through, summarizing, and visualizing data in order to draw out important
conclusions and spot trends. Python is now the most popular choice for doing EDA
because of its vast libraries and interactive features. Python is a flexible and widely used
programming language.

This chapter intends to offer a thorough overview of Python-based EDA. For

effective data exploration, it will include a range of methods, resources, and best
practices. In addition to aiding in getting a basic knowledge of the data, the EDA
method also helps in revealing hidden patterns, anomalies, and possible research
trajectories. This chapter will show how to efficiently load, visualize, and analyze data
using Python tools like pandas, matplotlib, seaborn, and plotly. It will also explore
statistical testing, hypothesis validation, and how crucial data preprocessing is for
effective EDA.

4.2 LITERATURE REVIEW

The value of EDA in the data analysis process is well known, and it has been
covered in great detail in the literature. John Tukey and Francis Anscombe, two well-
known statisticians, were among the early proponents of the concept of visual data
exploration, stressing the effectiveness of graphical tools in highlighting data patterns
and outliers.

A wealth of modules specifically designed for data analysis and Python's user-
friendly syntax have helped the programming language become extremely popular for
EDA in the modern setting. Data scientists and analysts may easily manipulate and
visualize data using the Python libraries, which include pandas, NumPy, matplotlib,
and seaborn.

EDA in Python has been the subject of several publications, training programs,
and books, underscoring its importance in the fields of data science and machine
learning.

EDA in Python has been the subject of several publications, training programs,
and books, underscoring its importance in the fields of data science and machine
learning. EDA still serves as a crucial step in bridging the gap between unprocessed data

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 29

Exploratory Data Analysis Using Python

and useful insights as data-driven decision-making expands and changes across

numerous industries. This chapter expands on the ideas introduced by those
contributions and seeks to provide readers a useful and hands-on introduction to
Python-based EDA.

4.3 EDA
An important element in the data analysis process that aids in understanding
and making sense of our data is exploratory data analysis (EDA). We are able to find
patterns, linkages, anomalies, and insights by visually and statistically analyzing the
data. These findings help us to further analyze the data and make decisions. Python is a
great option for EDA due of its robust ecosystem of libraries. We'll look at a variety of
Python modules and methods in this chapter to do EDA efficiently.

4.4 CONSTRUCTING THE ENVIRONMENT

You need to set up your Python environment before we begin the EDA. The use
of Jupyter Notebook or JupytersLab is advised because they offer a collaborative and
interactive environment for data analysis.

Installing the following libraries for Python is necessary:

 NumPy: For manipulating arrays and performing numerical computations.

 Pandas: For analyzing and manipulating data.

For fundamental data visualization, use Matplotlib.

 Seaborn: For producing more sophisticated and appealing visualizations.

 Plotly: For online visualizations that are interactive.

 Jupyter Widgets (ipywidgets): To make your EDA notebooks more interactive.

 Missingno: For representing missing info visually.

 SciPy: used for statistical analysis and testing.

You can install these libraries using pip:

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 30

Engineering the Future: Machine Learning and Data Science in Practice

4.5 UNDERSTANDING AND ADDING DATA

Data loading is the initial step in the EDA process. The default data import and
processing tool in Python is the pandas package. Data can be loaded into SQL databases,
CSV, Excel, and other formats. How to load a CSV file is as follows:

Once you have your data loaded, start by understanding its basic characteristics:

4.6 DATA VISUALIZATION

Data visualization is an effective EDA technique. Python has a variety of
libraries available for building both static and interactive charts. For this chapter, we'll
concentrate on matplotlib, seaborn, and plotly.

4.7 BASIC MATPLOTLIB VISUALIZATION

Matplotlib is a flexible package that may be used to make a variety of static plots. Here's
an easy illustration:

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 31

Exploratory Data Analysis Using Python

4.8 SEABORN'S ENHANCED VISUALIZATION

Using a high-level interface, Seaborn, which is developed on top of Matplotlib, lets users
create visually appealing plots:

4.9 PLOTLY'S INTERACTIVE VISUALIZATION

Plotly is a fantastic tool for building interactive stories. It can smoothly connect with
Jupyter Notebook:

4.10 HANDLING MISSING DATA

Dealing with missing data is a critical aspect of EDA. Python provides the missingno
library to visualize and handle missing values:

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 32

Engineering the Future: Machine Learning and Data Science in Practice

4.11 ANALYZING RELATIONSHIPS

Understanding relationships between variables is key to gaining insights. You can use
correlation matrices, scatter plots, and pair plots for this:

4.12 STATISTICAL ANALYSIS

Statistical analysis can provide valuable insights into your data. Python's scipy library is
a great choice for conducting statistical tests:

4.13 CONCLUSION
Exploratory Data Analysis (EDA) is a crucial stage of the data analysis process,
and Python offers a robust toolkit to carry out EDA successfully. You can obtain
insightful knowledge that directs your additional analysis and decision-making by
visualizing data, dealing with missing values, and performing statistical analysis. Keep
in mind that EDA is an iterative process that develops as you explore your data more

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 33

Exploratory Data Analysis Using Python

deeply; it is not a one-time effort. You are prepared to begin using Python for your EDA
endeavors with the approaches and resources discussed in this chapter.

REFERENCES
1. Tukey, John W. "Exploratory Data Analysis." Addison-Wesley, 1977.

2. John Tukey's seminal work, which introduced the concept of EDA and laid the foundation
for modern data analysis techniques.

3. Anscombe, Francis J. "Graphs in Statistical Analysis." The American Statistician, 1973.

4. Francis Anscombe's paper that underscores the importance of visualizations and provides
the famous Anscombe's Quartet as an example.

5. McKinney, Wes. "Python for Data Analysis." O'Reilly Media, 2017.

6. A comprehensive book that focuses on data analysis with Python, including a detailed
section on EDA using pandas.

7. VanderPlas, Jake. "Python Data Science Handbook." O'Reilly Media, 2016.

8. This book covers various aspects of data science in Python, with a focus on EDA, data
visualization, and analysis techniques.

9. Wickham, Hadley. "ggplot2: Elegant Graphics for Data Analysis." Springer, 2016.

10. While primarily focused on R, this book introduces the grammar of graphics and provides
valuable insights into data visualization principles, which can be adapted to Python with
libraries like seaborn.

11. Sahoo, K., Samal, A. K., Pramanik, J., & Pani, S. K. (2019). Exploratory data analysis
using Python. International Journal of Innovative Technology and Exploring
Engineering, 8(12), 4727-4735.

12. Samet, R., & Tural, S. (2010). Web based real-time meteorological data analysis and
mapping information system. Proceedings of WSEAS Transactions of Information
Science. and Applications, 1115-1125.

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 34

Data Analysis With Python
No ratings yet
Data Analysis With Python
29 pages
Python EDA Guide: Step-by-Step Process
100% (1)
Python EDA Guide: Step-by-Step Process
20 pages
Python Pandas Assignments
No ratings yet
Python Pandas Assignments
3 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Dataprep - Eda: Task-Centric Exploratory Data Analysis For Statistical Modeling in Python
No ratings yet
Dataprep - Eda: Task-Centric Exploratory Data Analysis For Statistical Modeling in Python
10 pages
Exploratory Data Analysis With Python
No ratings yet
Exploratory Data Analysis With Python
2 pages
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
No ratings yet
Mastering Exploratory Data Analysis With Python - A Comprehensive Guide To Unveiling Hidden Insights
73 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
Exploratory Data Analysis (EDA) Using Python
No ratings yet
Exploratory Data Analysis (EDA) Using Python
21 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Business Analytics and Data Science
No ratings yet
Business Analytics and Data Science
25 pages
Regression Models With Python
No ratings yet
Regression Models With Python
128 pages
Unit 1
No ratings yet
Unit 1
19 pages
Practical 02
No ratings yet
Practical 02
3 pages
Ip Class 12 Project
No ratings yet
Ip Class 12 Project
47 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Mini Project Report On
No ratings yet
Mini Project Report On
17 pages
1.3.1. Exploratory Data Analysis
No ratings yet
1.3.1. Exploratory Data Analysis
24 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Labdev
No ratings yet
Labdev
57 pages
Data Analytics Course for Beginners
No ratings yet
Data Analytics Course for Beginners
34 pages
Essential Guide To Data Science For Petroleum Engineers
No ratings yet
Essential Guide To Data Science For Petroleum Engineers
150 pages
Python for High School Data Exploration
No ratings yet
Python for High School Data Exploration
28 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
Python Data Import/Export with Pandas
No ratings yet
Python Data Import/Export with Pandas
6 pages
Data Analysis
No ratings yet
Data Analysis
2 pages
Python Data Analysis Handbook
No ratings yet
Python Data Analysis Handbook
57 pages
G-12 Humanities Stream CBSE Project and Assignment 2023-2024
100% (1)
G-12 Humanities Stream CBSE Project and Assignment 2023-2024
2 pages
DL EDA Process
No ratings yet
DL EDA Process
2 pages
Data Exploration & Visualization Guide
No ratings yet
Data Exploration & Visualization Guide
42 pages
Python Data Analyser Project Report
No ratings yet
Python Data Analyser Project Report
18 pages
Unit 1
No ratings yet
Unit 1
52 pages
AI & Data Science Lab Guide
No ratings yet
AI & Data Science Lab Guide
35 pages
SocBiz-Winter Analytics Resources
No ratings yet
SocBiz-Winter Analytics Resources
7 pages
Best Journal
No ratings yet
Best Journal
11 pages
Exploratory Data Analysis Course
No ratings yet
Exploratory Data Analysis Course
139 pages
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Document
No ratings yet
Document
21 pages
PBA
No ratings yet
PBA
30 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Python For Data Analysis 2nd Module
No ratings yet
Python For Data Analysis 2nd Module
14 pages
About
No ratings yet
About
3 pages
Unit 1
No ratings yet
Unit 1
23 pages
Report (Jeevan)
No ratings yet
Report (Jeevan)
27 pages
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
No ratings yet
Introduction To EDA: Exploratory Data Analysis (EDA) in Data Science
4 pages
Moocs jayashRA2111003011636
No ratings yet
Moocs jayashRA2111003011636
14 pages
Revision Worksheet (2024-2025)
No ratings yet
Revision Worksheet (2024-2025)
9 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
10 pages
Data Analyst
No ratings yet
Data Analyst
4 pages
Sensors 22 05849 v2
No ratings yet
Sensors 22 05849 v2
19 pages
Movie Recommendation System Report
No ratings yet
Movie Recommendation System Report
44 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
Group 7
No ratings yet
Group 7
19 pages
Lesson 5 Exploratory Data Analysis
No ratings yet
Lesson 5 Exploratory Data Analysis
10 pages
Informatics Practices Q & A
No ratings yet
Informatics Practices Q & A
14 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
Machine
No ratings yet
Machine
10 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
MLLABDSA
No ratings yet
MLLABDSA
16 pages
Python MySQL Student DB Management
No ratings yet
Python MySQL Student DB Management
10 pages
Introduction To EDA
No ratings yet
Introduction To EDA
16 pages
Jayalakshmi
No ratings yet
Jayalakshmi
68 pages
Pandas String Methods Guide
No ratings yet
Pandas String Methods Guide
19 pages
EDA - Unit-1: Prerequisite of The Subject
No ratings yet
EDA - Unit-1: Prerequisite of The Subject
5 pages
BIA Data Science Detailed Brochure - Vikhroli West, Mumbai-1
No ratings yet
BIA Data Science Detailed Brochure - Vikhroli West, Mumbai-1
28 pages
Python Syllabus
No ratings yet
Python Syllabus
5 pages
Question Bank
No ratings yet
Question Bank
18 pages
Ai For IT Non Coders
No ratings yet
Ai For IT Non Coders
14 pages
Pandas Series Worksheet1
No ratings yet
Pandas Series Worksheet1
3 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
15 pages
P23MBA547 Predictive Analytics
No ratings yet
P23MBA547 Predictive Analytics
133 pages
DAP Module4 Notes
No ratings yet
DAP Module4 Notes
17 pages
Python Introduction
No ratings yet
Python Introduction
109 pages
Python
No ratings yet
Python
170 pages
Guide Eda Python 2
No ratings yet
Guide Eda Python 2
30 pages
OCS353 DFS Lab Manual
No ratings yet
OCS353 DFS Lab Manual
58 pages
Unit-I QB
No ratings yet
Unit-I QB
3 pages
Data Science & AI Online Syllabus en 2025
No ratings yet
Data Science & AI Online Syllabus en 2025
26 pages
Ad3301 Unit 1
No ratings yet
Ad3301 Unit 1
15 pages
Data Analytics by Using Python 2025-09-13
No ratings yet
Data Analytics by Using Python 2025-09-13
2 pages
Python EDA Guide
No ratings yet
Python EDA Guide
3 pages
Exploratory Data Analysis: Table of Content
No ratings yet
Exploratory Data Analysis: Table of Content
11 pages
Eda U1
No ratings yet
Eda U1
144 pages

Book Draft 42

Uploaded by

Book Draft 42

Uploaded by

Chapter: 04

EXPLORATORY DATA ANALYSIS USING PYTHON

Mohd Hyder Gouri*

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 28

This chapter intends to offer a thorough overview of Python-based EDA. For

4.2 LITERATURE REVIEW

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 29

and useful insights as data-driven decision-making expands and changes across

4.4 CONSTRUCTING THE ENVIRONMENT

Installing the following libraries for Python is necessary:

 NumPy: For manipulating arrays and performing numerical computations.

 Pandas: For analyzing and manipulating data.

For fundamental data visualization, use Matplotlib.

 Seaborn: For producing more sophisticated and appealing visualizations.

 Plotly: For online visualizations that are interactive.

 Jupyter Widgets (ipywidgets): To make your EDA notebooks more interactive.

 Missingno: For representing missing info visually.

 SciPy: used for statistical analysis and testing.

You can install these libraries using pip:

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 30

4.5 UNDERSTANDING AND ADDING DATA

4.6 DATA VISUALIZATION

4.7 BASIC MATPLOTLIB VISUALIZATION

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 31

4.8 SEABORN'S ENHANCED VISUALIZATION

4.9 PLOTLY'S INTERACTIVE VISUALIZATION

4.10 HANDLING MISSING DATA

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 32

4.11 ANALYZING RELATIONSHIPS

4.12 STATISTICAL ANALYSIS

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 33

3. Anscombe, Francis J. "Graphs in Statistical Analysis." The American Statistician, 1973.

5. McKinney, Wes. "Python for Data Analysis." O'Reilly Media, 2017.

7. VanderPlas, Jake. "Python Data Science Handbook." O'Reilly Media, 2016.

E-ISBN- 978-93-88996-74-7, P-ISBN- 978-93-88996-92-1 34

You might also like