Department of AI & DS Engineering
List of Experiments
SUBJECT: Data Analysis Lab
CLASS: SY (A & B) SEMESTER: 4th
Academic Year : 2023-24
Sr. Title of Experiment CO PO PSO
No.
1. Study and implementation of Pandas Profiling, Sweetviz, Autoviz. CO4
2. Data Wrangling, I CO4,
Perform the following operations using Python on any open source da- CO5
taset (e.g., data.csv)
1. Import all the required Python Libraries.
2. Locate open source data from the web (e.g.,
https://www.kaggle.com). Provide a clear description
of the data and its source (i.e., URL of the web site).
3. Load the Dataset into pandas dataframe.
4. Data Preprocessing: check for missing values in the data us-
ing pandas is null(), describe() function to get some initial
statistics. Provide variable descriptions. Types of variables
etc. Check the dimensions of the data frame.
3. CO4,
Create an “Academic performance” dataset of students and perform the
following operations usingPython. CO5
1. Scan all variables for missing values and inconsistencies. If
there are missing values and/or inconsistencies, use any of
the suitable techniques to deal with them.
2. Scan all numeric variables for outliers. If there are outliers,
use any of the suitable techniquesto deal with them.
3. Apply data transformations on at least one of the variables.
The purpose of this transformation should be one of the fol-
lowing reasons: to change the scale for better understanding
of the variable, to convert a non-linear relation into a linear
one, or to decrease the skewness and convert the distribution
into a normal distribution.
Reason and document your approach properly.
4. CO4,
Perform the following operations on any open source dataset (e.g., da-
ta.csv) CO5
Provide summary statistics (mean, median, minimum, maxi-
mum, standard deviation) for a dataset (age, income etc.) with
numeric variables grouped by one of the qualitative (categori-
Department of AI & DS Engineering
cal) variable. For example, if your categorical variable is age
groups and quantitative variable is income, then provide sum-
mary statistics of income grouped by the age groups. Create a
list that contains a numeric value for each response to the cate-
gorical variable.
2. Write a Python program to display some basic statistical details
like percentile, mean, standard deviation etc. of the species of
‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-versicolor’ of iris.csv da-
taset.
Provide the codes with outputs and explain everything that you do in this
step.
5. 1. Use the inbuilt dataset 'titanic'. The dataset contains 891 rows and CO4,
contains information about the passengers who boarded the unfortunate CO5
Titanic ship. Use the Seaborn library to see if we can find any patterns in
the data.
2. Write a code to check how the price of the ticket (column name: 'fare')
for each passenger is distributed by plotting a histogram.
6. CO4,
Use the inbuilt dataset 'titanic' as used in the above problem. Plot a box
plot for distribution of age with respect to each gender along with the CO5
information about whether they survived or not. (Column names : 'sex'
and 'age')
Write observations on the inference from the above statistics.
7. Data Visualization III CO4,
Download the Iris flower dataset or any other dataset CO5
into a DataFrame.
(e.g., https://archive.ics.uci.edu/ml/datasets/Iris ). Scan the dataset and give
the inference as:
a. List down the features and their types (e.g., numeric, nominal)
available in the dataset.
b. Create a histogram for each feature in the dataset to illustrate
the feature distributions.
c. Create a boxplot for each feature in the dataset.
Compare distributions and identify outliers.
8. Implement mini project on Predicting Stock Prices Using Pandas and CO4,
Sckit –learn CO5
9. Implement color detection using Pandas and Autoviz, Sweetviz CO4,
CO5
Mr. B. B. Kondbhar
Subject In-charge