Dr.
Rafiq Zakaria Campus
P.G. Dept. of Computer Science
M.Sc. III Semester
Data mining and Data warehousing
Practical 02
Aim : To study Exploratory data analysis using python.
Description :
E x p lo ra to ry D a ta A n a ly s is ( E D A ) is a c ru c ial s te p in th e d a ta m in in g p ro c e s s . It h e lp s
in u n d e rs ta n d in g th e u n d er ly in g p a tte r ns in th e d a ta a n d in fo r m s s u b s e q u e n t a n aly s is
o r m o d e lin g . H e re ’ s a p r a c tic a l g u id e o n p e rfo rm in g E D A u s in g P y th o n , u tiliz in g
p o p u la r lib ra rie s s u c h a s P a n d a s , M a tp lo tlib , a n d S e a b o r n
Step-by-Step EDA in Python
1. Import Libraries
S tar t b y im p o r tin g th e n e c e s s a ry lib ra r ie s .
2 . Load the Data
L o a d y o u r d a ta s e t in to a P a n d a s D a taF ra m e .
3 . Understand the Data Structure
U s e b a s ic c o m m a n d s to ex p lo re the d a ta s e t.
4 . Data Cleaning
H a n d le m is s in g va lu e s a n d d u p lic a te s .
# C h e c k fo r m is s in g v alu e s
p r in t( d f.is n ull( ). s um () )
# F ill o r d ro p m is s in g va lu e s
d f .filln a (m e th o d = 'ffill', in p la c e= T ru e ) # E x a m p le : fo rw a rd fill
d f .d ro p _ d u p lic a te s (in p la c e = T r u e)
5 . Univariate Analysis
A n a lyz e in d ivid u a l fe a tu re s .
● Categorical Features :
● Numerical Features
Example Dataset
Y o u c a n u s e an y d a ta s et fo r th is a n a ly s is . P o p u la r c h o ic e s in c lud e th e T ita n ic d a tas e t
o r Iris d a ta s e t, b o th a va ila b le in S e a b o rn o r c a n b e d o w n lo a d e d fro m K a g g le .
Conclusion
E D A is a n ite ra tive p ro c e s s ; yo u m a y n e e d to g o b a c k a n d fo rth to re fin e yo u r
a n a lys is b a s e d o n fin d in g s . T h e in s ig h ts g a in e d d u rin g E D A a re in va lu a b le fo r g u id in g
yo u r d a ta m in in g e ffo rts , m o d e l s e lec tio n , a nd fe a tu re en g in e e rin g .
Output :
E D A is s tu d ied u s in g p y th o n lib ra rie s .
Prepared By : Khan Shagufta (PG Dept Of Comp Sci)