Practical - Questions - Unit 1 and 2

The document outlines a series of data analysis tasks, including Titanic Survival dataset analysis, dummy variable creation, untidy to tidy data transformation, Winsorization method, and missing value imputation. It also covers exploratory data analysis (EDA) and feature engineering on an insurance charges dataset, as well as linear regression model fitting and visualization. Each task specifies the steps to be performed, including data loading, visualization, statistical analysis, and model evaluation.

Uploaded by

zaheerkkd1312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

Practical - Questions - Unit 1 and 2

Uploaded by

zaheerkkd1312

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Questions

1. Titanic Survival Dataset Analysis:

o Load the Titanic Survival dataset.
o Display 5 sample observations from the dataset.
o Check and display the dataset information.
o Count the number of survivors based on gender (sex-wise
survival count).
o Check if there are any null values in the dataset.
o Plot a count plot showing the survival count passenger-wise.
o Create a strip plot of Age vs. Sex with hue as Survival status.
Identify the key factors influencing survival.
o Plot a pie chart for survival status and identify the percentage of
survivors.
2. Dummy Variable Creation:
o Create a DataFrame in Python using the following dataset:

Item Color
Item1 Red
Item2 Green
Item3 Blue
Item4 Red
Item5 Green

o Generate dummy variables for the Color column.

o Display the resulting DataFrame with the dummy variables.
3. Untidy to Tidy Data Transformation:
o Consider the following dataset:
Populatio
Country Year GDP
n
USA 2010 308 14992
USA 2011 311 15543
Canada 2010 34 1536
Canada 2011 35 1601

o Convert this untidy dataset into a tidy format such that

Population and GDP are represented as separate variables under
one column, and their respective values are listed in another
column.
o Display the tidy dataset.
4. Winsorization Method:
o For the given data [10, 15, 20, 25, 100, 150, 200], replace the
outliers with the 5th and 95th percentiles using the Winsorization
method.
5. Missing Value Imputation:
o For the given dataset:

Feature Feature
1 2
5 12
7 NaN
3 8
NaN 15
8 6
10 9
6 NaN
NaN 5
9 11
o Replace the missing values with the mean of their respective
columns.
o Replace the missing values with the median of their respective
columns.
o Replace the missing values using the K-Nearest Neighbors (KNN)
imputation method.

Question: Exploratory Data Analysis (EDA) and Feature Engineering

a) Load the dataset Insurance Charges Prediction.csv into a DataFrame

and perform the following:
o Display the first 5 rows of the dataset.
o Display the dataset information.
o Provide the statistical summary of the dataset for numerical
features.
o Provide the statistical summary of the dataset for categorical
features.
b) Perform Univariate Analysis:
o Plot histograms for all numerical columns in the dataset.
c) Perform Bivariate Analysis:
o Visualize the distribution of charges based on:
 Gender (sex) using a boxplot.
 Region (region) using a boxplot.
 Smoking status (smoker) using a boxplot.
o Plot a count plot to show the distribution of smoker status with
hue as sex.
o Create scatter plots for:
 Age vs. Charges.
 BMI vs. Charges.
d) Perform Correlation Analysis:
o Filter out the numerical columns.
o Calculate and display the correlation matrix for numerical
variables.
o Visualize the correlation matrix using a heatmap.
e) Filter the categorical variables:
o Extract only categorical features.
o Display the names of the categorical columns.
f) Perform Feature Engineering:
o Use pd.get_dummies function to encode the categorical variables
into dummy variables.

Question: Linear Regression Model and Visualization

a) Given the following dataset:

x y
5 5
15 20
25 14
35 32
45 22
55 38

b) Plot a scatter plot to visualize the relationship between x and y.

c) Fit a Linear Regression model using x and y.
o Calculate the coefficient of determination (R²) to evaluate the
goodness of fit.
o Display the intercept and coefficient of the linear regression
model.
d) Predict the dependent variable (y) for the following new values of the
independent variable (x): 8, 15, and 35.
e) Plot the original data points and overlay the fitted regression line.

2023 Data Analysis and Visualization Using Python
100% (2)
2023 Data Analysis and Visualization Using Python
9 pages
ML Lab Manual 2024
No ratings yet
ML Lab Manual 2024
41 pages
DAV Practical File 234003
No ratings yet
DAV Practical File 234003
14 pages
Class XII Informatics Test
No ratings yet
Class XII Informatics Test
6 pages
External
No ratings yet
External
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
MLT Lab Prep Guide
No ratings yet
MLT Lab Prep Guide
37 pages
List of Experiment - Data Analysis Lab
No ratings yet
List of Experiment - Data Analysis Lab
2 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
AE II Simulation File PDF
No ratings yet
AE II Simulation File PDF
32 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
GE Practical Sem 2
No ratings yet
GE Practical Sem 2
28 pages
Manishadav
No ratings yet
Manishadav
27 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Detailed ML Project Presentation Titanic Housing
No ratings yet
Detailed ML Project Presentation Titanic Housing
13 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
21hcs4108 Davpracticals
No ratings yet
21hcs4108 Davpracticals
29 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
DAV Practicle File
No ratings yet
DAV Practicle File
28 pages
Certificate
No ratings yet
Certificate
25 pages
Data Visualization Quiz Questions
No ratings yet
Data Visualization Quiz Questions
63 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Data Understanding and Preparation
No ratings yet
Data Understanding and Preparation
48 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Fdsa New Lab
No ratings yet
Fdsa New Lab
14 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Data Science
No ratings yet
Data Science
18 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Introduction To Mediation Models With The PROCESS Marco in SPSS
No ratings yet
Introduction To Mediation Models With The PROCESS Marco in SPSS
47 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Python 1
No ratings yet
Python 1
16 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
46 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
Dav Pracs
No ratings yet
Dav Pracs
9 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
TD5Numpy Pandas and Matplotlib
No ratings yet
TD5Numpy Pandas and Matplotlib
5 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
Class 12 AI Practical File
No ratings yet
Class 12 AI Practical File
5 pages
CS3361 Set2
No ratings yet
CS3361 Set2
6 pages
Practical 2 fKs4RPadH3
No ratings yet
Practical 2 fKs4RPadH3
4 pages
Data Mining Lab 03
No ratings yet
Data Mining Lab 03
10 pages
Python For Data Sceince l1 Hands On
No ratings yet
Python For Data Sceince l1 Hands On
5 pages
Multivariate Analysis Techniques
No ratings yet
Multivariate Analysis Techniques
4 pages
Lab Questionbank
No ratings yet
Lab Questionbank
3 pages
PRACTICAL QUESTIONS For DSBDA
No ratings yet
PRACTICAL QUESTIONS For DSBDA
9 pages
Divp Pyq 2023
No ratings yet
Divp Pyq 2023
7 pages
Mechine Learning
No ratings yet
Mechine Learning
106 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Date Preparation and Exploration:: Titanic Data - CSV
No ratings yet
Date Preparation and Exploration:: Titanic Data - CSV
5 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Assignment 1 - PM299 - 1 2nd Sem SY 2022-2023 - Sabidra - de Guzman
No ratings yet
Assignment 1 - PM299 - 1 2nd Sem SY 2022-2023 - Sabidra - de Guzman
13 pages
Statistical Design and Analysis of Biological Experiments
No ratings yet
Statistical Design and Analysis of Biological Experiments
281 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Data Science
No ratings yet
Data Science
3 pages
Guidelines DAVP
No ratings yet
Guidelines DAVP
3 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Chi Square Tests and Nonparametric Tests
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Chi Square Tests and Nonparametric Tests
33 pages
Sample Data Analysis
No ratings yet
Sample Data Analysis
78 pages
Data Analyst Roadmap 1687174001
No ratings yet
Data Analyst Roadmap 1687174001
8 pages
Gompertz and Logistic Growth Models
No ratings yet
Gompertz and Logistic Growth Models
8 pages
May 2021 Examination Diet School of Mathematics & Statistics MT4614
No ratings yet
May 2021 Examination Diet School of Mathematics & Statistics MT4614
6 pages
Exam Guide: Probability & Statistics
No ratings yet
Exam Guide: Probability & Statistics
10 pages
Sufficient Statistics - Problems - Solved - Xiang - Yin
No ratings yet
Sufficient Statistics - Problems - Solved - Xiang - Yin
5 pages
Huard 2008 WRR
No ratings yet
Huard 2008 WRR
19 pages
Bearing Prob
No ratings yet
Bearing Prob
18 pages
Correlation and Regression
No ratings yet
Correlation and Regression
37 pages
Método de Hayward Fredericks: Josué Betancourt
No ratings yet
Método de Hayward Fredericks: Josué Betancourt
11 pages
Econometrics Exam Guide
No ratings yet
Econometrics Exam Guide
19 pages
Auditing: Estimation of Errors
No ratings yet
Auditing: Estimation of Errors
5 pages
Lab Programs 1 To 5
No ratings yet
Lab Programs 1 To 5
12 pages
CSU e-PASA Lectures 2020 (Mathematics Category) : Learning Activity Sheet Statistics and Probability
No ratings yet
CSU e-PASA Lectures 2020 (Mathematics Category) : Learning Activity Sheet Statistics and Probability
5 pages
Brugger (1969) - A Note On Unbiased Estimation of The Standard Deviation
No ratings yet
Brugger (1969) - A Note On Unbiased Estimation of The Standard Deviation
2 pages
Welcome I III - V Chapter 1: The Certified Quality Engineer Exam 1
No ratings yet
Welcome I III - V Chapter 1: The Certified Quality Engineer Exam 1
3 pages
Markov Trading Model 1719707206
No ratings yet
Markov Trading Model 1719707206
4 pages
Yulu Case Study
No ratings yet
Yulu Case Study
1 page
Reliability: Case Processing Summary
No ratings yet
Reliability: Case Processing Summary
2 pages
Submitted By: Syeda Fizza Raza Naqvi (MBA192005) Answer 1
No ratings yet
Submitted By: Syeda Fizza Raza Naqvi (MBA192005) Answer 1
2 pages
Chapter 5 - Hypothesis Test
No ratings yet
Chapter 5 - Hypothesis Test
2 pages
Tasarı - Estimation of Uster H
No ratings yet
Tasarı - Estimation of Uster H
14 pages

Practical - Questions - Unit 1 and 2

Uploaded by

Practical - Questions - Unit 1 and 2

Uploaded by

Questions

1. Titanic Survival Dataset Analysis:

o Generate dummy variables for the Color column.

o Convert this untidy dataset into a tidy format such that

Question: Exploratory Data Analysis (EDA) and Feature Engineering

a) Load the dataset Insurance Charges Prediction.csv into a DataFrame

Question: Linear Regression Model and Visualization

a) Given the following dataset:

b) Plot a scatter plot to visualize the relationship between x and y.

You might also like