0% found this document useful (0 votes)

15 views11 pages

Set-D CT2 Answerkey

This document outlines the structure and content of a test for the Data Science course at SRM Institute of Science and Technology for the academic year 2024-25. It includes a course articulation matrix, instructions for answering questions, and a variety of questions covering data manipulation, visualization, and imputation techniques. The test is divided into three parts: multiple choice questions, descriptive questions, and coding tasks using Python and pandas.

Uploaded by

Manasa B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

Set-D CT2 Answerkey

Uploaded by

Manasa B

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Register

Number

SRM Institute of Science and Technology

Set -
College of Engineering and Technology
School of Computing
SRM Nagar, Kattankulathur – 603203, Chengalpattu District, Tamil Nadu
Academic Year: 2024-25 (EVEN)

Test: FT4 Date: 29-04-2025

Course Code & Title: 21CSS303T-Data Science Duration: Two periods
Year& Sem: III Year /VI Sem Max.Marks:50

Course Articulation Matrix:

Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcome
CO3 - - - - 1 - - - - - - -
CO4 - - - - 1 - - - - - - -
CO5 - - - - 1 - - - - - - -
Note: CO3 – To identify data manipulation and cleaning techniques using pandas
CO4 – To constructs the Graphs and plots to represent the data using python packages
CO5 – To apply the principles of the data science techniques to predict and forecast the outcome of real-
world problem
Part – A (10 x 1 = 10 Marks)
Instructions:
1) Answer ALL questions.
2) The duration for answering Part A is 15 minutes (this sheet will be collected after 15 minutes).
3) Encircle the correct answer.

S.No Question Marks BL CO PO PI Code

1 State the data wrangling operation that handles errors, missing data and 1 1 3 5 5.4.1
inconsistencies
a. Validation
b. Data enrichment
c. Cleaning
d. Organization
2 Name the pandas method that can be used to combine DataFrames using one 1 1 3 5 5.4.1
or more keys, as in database join operations
a. pandas.concat
b. pandas.merge
c. DataFrame.combine_first
d. DataFrame.join
3 Define the objective of imputation process 1 1 3 5 5.4.1
a. Remove entire rows or columns containing missing values
b. Remove pairs of observations where at least one value is missing
c. Replacing missing data with estimated values
d. Remove noise from the dataset using some algorithms
4 Identify the reshape process among the following that turns unique values 1 2 3 5 5.4.1
from one column into new column headers, effectively transforming long-
form data to wide -form
a. Melting
b. Stacking
c. Pivoting
d. Unstacking

5 Which among the following is a common measure of dispersion of data 1 2 3 5 5.4.1

a. median
b. standard deviation
c. histogram
d. skewness
6 In Matplotlib, which of the following correctly creates a subplot at position 5 1 1 4 5 5.5.2
in a 4-row by 3-column grid?
a. plt.subplot(3, 4, 5)
b. plt.subplot(5, 3, 4)
c. plt.subplot(4, 3, 5)
d. plt.subplot(5, 4, 3)
7 From the below list, recall the construct used to add text or markers to 1 1 4 5 5.4.1
specific locations on a plot to highlight particular features
a. Legends
b. Labels
c. Annotations
d. Ticks
8 Among the following statements, recognize the correct statement about 1 1 4 5 5.5.1
Python’s matplotlib.pyplot package
a. pyplot is used only for 3D plotting in Python.
b. pyplot automatically displays plots without the need to call show().
c. pyplot provides a MATLAB-like interface for creating static,
interactive, and animated plots.
d. pyplot cannot save plots in pdf format.
9 Identify the Seaborn package feature that allows you to visualize relationship 1 2 5 5 5.4.1
between all pairs of numeric columns in DataFrames
a. FacetGrid
b. Pairplot
c. Scatterplot
d. subplot
10 Identify the incorrect statement regarding seaborn package 1 2 5 5 5.5.1
a. Seaborn is a data visualization library built on top of Matplotlib
b. Seaborn allow us to represent data points in three-dimensional space
c. Seaborn can be imported using import matplotlib.seaborn as sns
d. Seaborn can be used to visualize textual data by creating wordcloud
Register
Number

SRM Institute of Science and Technology

College of Engineering and Technology Set -
School of Computing
SRM Nagar, Kattankulathur – 603203, Chengalpattu District, Tamil Nadu
Academic Year: 2024-25 (EVEN SEM)

Test: FT4 Date:29-04-2025

Course Code & Title: 21CSS303T-Data Science Duration: Two periods
Year& Sem: III Year /VI Sem Max.Marks:50

Part – B (4 x 5 = 20 Marks)
Instructions: Answer ANY FOUR Questions

Q. Question Marks BL CO PO PI
No Code

11 Discuss different data structures that help optimize memory and 5 2 3 5 5.6.1
computation while handling large data volumes. Briefly review their
strengths and weaknesses.

Ans:
Data structures have different storage requirements, but also
influence the performance of CRUD (create, read, update, and
delete) and other operations on the data set

• Tree is a hierarchical data structure where each node has a parent

and may have child nodes, used for searching and sorting. Trees are
a class of data structure that allows you to retrieve information much
faster than scanning through a table
• Hash is a key-value data structure that provides fast lookups using
a hash function. A key for every value in your data and put the keys
in a bucket. This way you can quickly retrieve the information by
looking in the right bucket when you encounter the data.
Dictionaries in Python are a hash table implementation, and they’re
a close relative of key-value stores
• Sparse data refers to datasets with mostly zero or missing values,
stored efficiently to save memory.
12 Given the following scenario, perform appropriate data cleaning, 5 3 3 5 5.5.2
transformation, and merging steps:

Dataset A contains employee records with columns: EmpID, Name, Age,

and Department. Some age values are missing, and department names
have inconsistent casing (e.g., "HR", "hr", "Hr").

Dataset B contains salary details with columns: EmpID, MonthlySalary.

Write Python code (using pandas) to:

1. Clean the Age using suitable imputation
2. Clean the Name by removing unnecessary spaces
3. Apply standardize capitalization on the column Department.
4. Merge the two datasets on EmpID.
5. Display the total salary aggregated on the Department column

(You may assume dummy data for illustration.)

Ans:
1. Convert datasets to DataFrames
df_a = pd.DataFrame(data_a)
df_b = pd.DataFrame(data_b)

2. Clean the Age column using suitable imputation

df_a['Age'].fillna(df_a['Age'].mean(), inplace=True)

3. Clean the Name column by removing unnecessary spaces

df_a['Name'] = df_a['Name'].str.strip()

4. Standardize capitalization of the Department column

df_a['Department'] = df_a['Department'].str.capitalize()

5. Merge the two datasets on EmpID

merged_df = pd.merge(df_a, df_b, on='EmpID')

6. Display the total salary aggregated by the Department

total_salary_by_dept =
merged_df.groupby('Department')['MonthlySalary'].sum().reset_in
dex()

7. Display the result

print(total_salary_by_dept)

13 Distinguish between Z-score normalization and Min-max normalization. 5 2 3 5 5.6.1

Under what data conditions would each method be more appropriate?

Ans:
Z-score normalization is a data preprocessing technique that
transforms numerical data to have a mean of 0 and a standard
deviation of 1. This is particularly useful when dealing with features
that have different scales or units, as it ensures that all features
contribute equally to the model.

Advantages:
1. Handles different Scales
2. Improves Machine Learning Models
3. Reduce Bias
4. Helps with outliers
Min-max normalization is a data preprocessing technique that
scales numerical data to a specific range, typically between 0 and 1.
It's useful when you want to preserve the relative distances between
data points while ensuring that all features have a similar scale

14 Write the python code for creating s 2 X 2 grid of plots with the 5 3 4 5 5.5.2
following subplots using matplotlib.pyplot
1. Grid 1 – line plot
2. Grid 2 – Scatter plot
3. Grid 3 – Bar
4. Gid 4 – histogram

(You may assume dummy data (Qno:12) for illustration.)

Ans:
import matplotlib.pyplot as plt
import numpy as np

#Data
x = np.arange(1, 6)
y = x ** 2
categories = ['A', 'B', 'C', 'D', 'E']
values = [5, 7, 3, 8, 6]
hist_data = np.random.randn(1000)

#Plotting
plt.figure(figsize=(10, 8))

plt.subplot(2, 2, 1)
plt.plot(x, y, marker='o')
plt.title('Line Plot')

plt.subplot(2, 2, 2)
plt.scatter(x, y, color='green')
plt.title('Scatter Plot')

plt.subplot(2, 2, 3)
plt.bar(categories, values, color='orange')
plt.title('Bar Plot')

plt.subplot(2, 2, 4)
plt.hist(hist_data, bins=20, color='purple')
plt.title('Histogram')

plt.tight_layout()
plt.show()
15 You are given a dataset that contains the daily temperature (Temp), 5 3 5 5 5.5.2
humidity (Humidity), and air quality index (AQI) recorded over 5 days
.
Days = [1,2,3,4,5]
Temperature = [23,25,28,32,35]
AQI = [3,5,4,2,5]
Write Python code using Seaborn and Matplotlib to visualize the
relationship among these three variables using a 3D line plot, where:
• X-axis → Day (as a sequence)
• Y-axis → Temperature
• Z-axis → AQI

Ans:
import matplotlib.pyplot as plt
import seaborn as sns

# Data
Days = [1, 2, 3, 4, 5]
Temperature = [23, 25, 28, 32, 35]
AQI = [3, 5, 4, 2, 5]

# Create 3D plot
sns.set(style="whitegrid")
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

# Plotting the 3D line

ax.plot(Days, Temperature, AQI, marker='o', color='blue',
label='Temp vs AQI')

# Label axes
ax.set_xlabel('Day')
ax.set_ylabel('Temperature (°C)')
ax.set_zlabel('AQI')
ax.set_title('3D Line Plot of Day vs Temperature vs AQI')

# Show plot
plt.legend()
plt.show()

Part – C (2 x 10 = 20 Marks)
Instructions: Answer ALL questions.

Q. Question Marks BL CO PO PI
No Code
16 a How missing values are represented in a dataset? With examples, 10 2 3 5 5.5.1
describe the various imputation techniques used for handling of missing
values so that there is minimum loss of information.

Ans:
Imputation is the process of replacing missing data with estimated
values to maintain dataset integrity.

Mean/Median/Mode Imputation: Replace missing values with

the mean, median, or mode of the respective column. This is a
simple approach but can introduce bias if the distribution is
skewed
When to Use:
•Mean: Best for normally distributed data.
•Median: Preferred when data is skewed or has outliers.
•Mode: Used for categorical data.
K-Nearest Neighbors (KNN) Imputation: Impute missing
values using the average values of the k nearest neighbors. This
method can be effective for numerical data.

Regression Imputation: Use regression models to predict

missing values based on other features. This is suitable for
numerical data with strong relationships between features.

Multiple Imputation: Create multiple imputed datasets by filling

in missing values with different plausible values. This method can
help to account for uncertainty in the imputation process

Choosing the right approach

The best approach for handling missing values depends on the

nature of your data, the amount of missing data, and the specific
requirements of your analysis. Consider the following factors:

• Amount of missing data: If there are many missing values,

imputation might be preferable to deletion.
• Distribution of missing data: If missingness is random,
imputation might be suitable. If missingness is related to
other variables, more sophisticated techniques might be necessary.
• Impact of missing data on the analysis: If missing values are
likely to bias your results, it's important to address
them.

Give a simple example.

(OR)

16 b You are given a Pandas DataFrame containing a column Customer_Info 10 3 3 5 5.5.2

with inconsistent entries like:

" Mr. Ramesh K , Chennai - 600001 "

"Ms. PRIYA D,COIMBATORE-641002"
"Dr. Arjun,Madurai - 625001"
"Mrs. Leela S , Chennai - 6251 "

Perform the following tasks using Pandas string manipulation methods:

1. Strip leading and trailing whitespaces from the entire
Customer_Info column.
2. Replace all hyphens - with a single space and convert multiple
spaces to a single space.
3. Extract the following components into new columns:
o Title (Mr., Ms., Dr., etc.)
o Name (in uppercase)
o City (in title case)
4. Pad the PIN code column (if needed) so that all valid entries
have 6 digits (e.g., "6251" becomes "006251").

Ans:
import pandas as pd

# Create dataframe
data = {
'Customer_Info': [
" Mr. Ramesh K , Chennai - 600001 ",
"Ms. PRIYA D,COIMBATORE-641002",
"Dr. Arjun,Madurai - 625001",
"Mrs. Leela S , Chennai - 6251 "
]
}

df = pd.DataFrame(data)

1. Strip leading and trailing whitespaces

df['Customer_Info'] = df['Customer_Info'].str.strip()

2. Replace hyphens with space and normalize multiple

spaces
df['Customer_Info'] = df['Customer_Info'].str.replace('-', ' ',
regex=False)
df['Customer_Info'] = df['Customer_Info'].str.replace(r'\s+', ' ',
regex=True)

3. Extract Title, Name, City, and PIN using regex

df[['Title', 'Name', 'City', 'PIN']] = df['Customer_Info'].str.extract(
r'(Mr\.|Mrs\.|Ms\.|Dr\.)\s+([A-Za-z\s]+),?\s*([A-Za-
z]+)\s+(\d+)', expand=True
)
4. Format extracted fields
df['Name'] = df['Name'].str.upper().str.strip()
df['City'] = df['City'].str.title().str.strip()

5. pad PIN with zeros if less than 6 digits

df['PIN'] = df['PIN'].str.zfill(6)

print(df[['Title', 'Name', 'City', 'PIN']])

17 a Explain the features of Seaborn library. Also describe the importance 10 2 4 5 5.5.1
of Facet Grid, joint plot and pair plot with example implementation.

Ans:
• Seaborn is a library mostly used for statistical plotting in
Python.
• It is built on top of Matplotlib and provides beautiful default
styles and color palettes to make statistical plots more
attractive.

Features of Seaborn

Statistical Graphics: Seaborn is specifically designed for

creating statistical graphics, providing built-in functions for
common visualizations like scatter plots, line plots, histograms,
and more. This makes it easier to create visually appealing and
informative plots for data analysis.

Data Visualization Themes: Seaborn offers pre-defined styles

and themes that can quickly change the overall appearance of your
plots. This helps create consistent and aesthetically pleasing
visualizations without requiring extensive customization.

Integration with Pandas and NumPy: Seaborn seamlessly

integrates with Pandas and NumPy, making it easy to work with
dataframes and arrays directly. This simplifies the workflow and
reduces the amount of code needed for data analysis and
visualization.

FacetGrid and Pair Plots: Seaborn provides FacetGrid for

grouping data and creating subplots based on categorical
variables. This is useful for comparing distributions or
relationships across different groups. Pair plots allow you to
visualize the relationships between all pairs of numeric columns
in a DataFrame, helping you identify correlations and patterns.

Customization and Flexibility: While Seaborn provides a high-

level interface, it's built on top of Matplotlib, giving you access to
its extensive customization options. This allows you to fine-tune
your plots to meet your specific needs.

Ease of Use: Seaborn's API is designed to be user-friendly and

intuitive, making it easier to learn and use compared to
Matplotlib. Its documentation is also well-written and provides
clear examples.

3D Plots

FacetGrid: Group data by a categorical variable and plot

individual subplots for each category.

g = sns.FacetGrid(df, col="hue", height=4)

Jointplot: Visualize the relationship between two variables and

their distributions.

sns.jointplot(x='x', y='y', kind="scatter", data =data)

Pairplot: Visualize the relationships between all pairs of numeric

columns in a DataFrame.

sns.pairplot(df)

(OR)

17 b You are provided with a sample dataset of product sales in a CSV file 10 3 5 5 5.5.2
named product_sales.csv. The dataset contains the following columns:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the CSV file

df = pd.read_csv('product_sales.csv')

# Set Seaborn style

sns.set(style='whitegrid')

#1. Histogram of Units_Sold

plt.figure(figsize=(8, 5))
sns.histplot(df['Units_Sold'], bins=10, kde=True, color='skyblue')
plt.title('Distribution of Units Sold')
plt.xlabel('Units Sold')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

#2. Box plot of Sale_Price by Category

plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='Category', y='Sale_Price', palette='Set2')
plt.title('Sale Price by Product Category')
plt.xlabel('Product Category')
plt.ylabel('Sale Price')
plt.tight_layout()
plt.show()
• A histogram showing the distribution of Units_Sold for all
products.
• A box plot comparing Sale_Price across different Category
values.

Course Outcome (CO) and Bloom’s level (BL) Coverage in Questions

CO Coverage
60% 55%

50% 45%

40%

30%

20%

10%

PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
12 IPRevision Papers 2025
No ratings yet
12 IPRevision Papers 2025
93 pages
Java Programming Exercises Volume Two Java Standard Library
No ratings yet
Java Programming Exercises Volume Two Java Standard Library
323 pages
Orange IP065 12 QP
No ratings yet
Orange IP065 12 QP
9 pages
Worksheet Class 12 Ai
No ratings yet
Worksheet Class 12 Ai
38 pages
Class 12 Pa 3 QP 27-Sept-2025
No ratings yet
Class 12 Pa 3 QP 27-Sept-2025
12 pages
Work Sheet-1 Class 12 IPR
No ratings yet
Work Sheet-1 Class 12 IPR
5 pages
Unit-II Data Science QB
No ratings yet
Unit-II Data Science QB
33 pages
Ip QP 1
No ratings yet
Ip QP 1
11 pages
All-In-One Xii Ip PB QP Ms 2024-25 (301 Pages)
No ratings yet
All-In-One Xii Ip PB QP Ms 2024-25 (301 Pages)
301 pages
Python CAT Papers
No ratings yet
Python CAT Papers
6 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Class Xii Ip - Sahodaya Set2 2023
No ratings yet
Class Xii Ip - Sahodaya Set2 2023
13 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
4 pages
A.reshape, Resize
No ratings yet
A.reshape, Resize
7 pages
Data Science
No ratings yet
Data Science
10 pages
QP-1PB-IP-2024 Set 1
No ratings yet
QP-1PB-IP-2024 Set 1
9 pages
22CS5PEDEV
No ratings yet
22CS5PEDEV
5 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
5 pages
Data Analysis Lab with Python
No ratings yet
Data Analysis Lab with Python
11 pages
12 Ip PP1 MS
No ratings yet
12 Ip PP1 MS
8 pages
VIP Question Bank For DPV For Theory Exam
No ratings yet
VIP Question Bank For DPV For Theory Exam
6 pages
IP Class 12 UT2
No ratings yet
IP Class 12 UT2
2 pages
QP of IP - 1st Preboard 2024-25 - Set1
No ratings yet
QP of IP - 1st Preboard 2024-25 - Set1
14 pages
Ip 1
No ratings yet
Ip 1
26 pages
Data Science Model 1 Ques
No ratings yet
Data Science Model 1 Ques
2 pages
22cs701-Spm Unit 4
No ratings yet
22cs701-Spm Unit 4
2 pages
Dav End Sem
No ratings yet
Dav End Sem
2 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
Set-C AnsKey CT2
No ratings yet
Set-C AnsKey CT2
10 pages
AK Preboard 2
No ratings yet
AK Preboard 2
10 pages
12 Ip PP2 MS
No ratings yet
12 Ip PP2 MS
8 pages
DVW 203105491 - 6697 - Question - Paper
No ratings yet
DVW 203105491 - 6697 - Question - Paper
2 pages
Report (Vaishnavi)
No ratings yet
Report (Vaishnavi)
46 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Set-B - CT2 - AnswerKey
No ratings yet
Set-B - CT2 - AnswerKey
10 pages
12pb24ip01 QP
No ratings yet
12pb24ip01 QP
12 pages
MCQ FDS
No ratings yet
MCQ FDS
5 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
GE - Computer Scien EaQvs42
No ratings yet
GE - Computer Scien EaQvs42
6 pages
InformaticsPractices-24-25 Classs XII
No ratings yet
InformaticsPractices-24-25 Classs XII
16 pages
Public Domain Book Access Guide
No ratings yet
Public Domain Book Access Guide
97 pages
PL MS6M30 1B-1
No ratings yet
PL MS6M30 1B-1
9 pages
Ms - Preboard QP IP 2024-25 - Set3she
No ratings yet
Ms - Preboard QP IP 2024-25 - Set3she
12 pages
Class 12 Informatics Practices Sample Paper
No ratings yet
Class 12 Informatics Practices Sample Paper
59 pages
Key Ip Pre Board 2024-25
No ratings yet
Key Ip Pre Board 2024-25
10 pages
H13 511 - V5.5 Demo
No ratings yet
H13 511 - V5.5 Demo
8 pages
Unit 2 NSC
No ratings yet
Unit 2 NSC
105 pages
DATASCIENCE (Unit-1) Question Bank
No ratings yet
DATASCIENCE (Unit-1) Question Bank
6 pages
Class 12 Thipmcq
No ratings yet
Class 12 Thipmcq
4 pages
Ip CLSS Xii 2024-25 Hy
No ratings yet
Ip CLSS Xii 2024-25 Hy
14 pages
SAP S/4HANA 1610 to 1809 Upgrade Guide
No ratings yet
SAP S/4HANA 1610 to 1809 Upgrade Guide
35 pages
Unit 2 Sepm
No ratings yet
Unit 2 Sepm
128 pages
RPCS3 Old Log
No ratings yet
RPCS3 Old Log
211 pages
Question Bank CIA 2
No ratings yet
Question Bank CIA 2
3 pages
Unit 2 Amc
No ratings yet
Unit 2 Amc
69 pages
12 Ip PP1 QP
No ratings yet
12 Ip PP1 QP
11 pages
Question Bank2 1722502558363
No ratings yet
Question Bank2 1722502558363
6 pages
Unit 1 GWCC
No ratings yet
Unit 1 GWCC
92 pages
Unit 1 Sepm
No ratings yet
Unit 1 Sepm
98 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
FODS Prevoius Paper
No ratings yet
FODS Prevoius Paper
4 pages
Computational
No ratings yet
Computational
7 pages
Library Management System Project
No ratings yet
Library Management System Project
28 pages
Revision Questions
No ratings yet
Revision Questions
19 pages
Wa0012.
No ratings yet
Wa0012.
30 pages
An Introduction To GCC - Brian Gough PDF
No ratings yet
An Introduction To GCC - Brian Gough PDF
124 pages
6205solved Ip CL Xii 2020
No ratings yet
6205solved Ip CL Xii 2020
11 pages
User Manual For UR Robots With Polyscope 3 5 Quick Changer v6.2.0 EN
No ratings yet
User Manual For UR Robots With Polyscope 3 5 Quick Changer v6.2.0 EN
80 pages
Unit 1 NSC
No ratings yet
Unit 1 NSC
56 pages
Resume 2025 Final
No ratings yet
Resume 2025 Final
2 pages
00 TE 242 - Syllabus - N - Schedule
No ratings yet
00 TE 242 - Syllabus - N - Schedule
2 pages
Unit 1 GW
No ratings yet
Unit 1 GW
21 pages
IP Question Paper 2020-2021
No ratings yet
IP Question Paper 2020-2021
9 pages
Public Key Infrastructure - Kerberos
No ratings yet
Public Key Infrastructure - Kerberos
13 pages
Medelec EEG Operation Manual 4
No ratings yet
Medelec EEG Operation Manual 4
28 pages
2020-21 XIIInfo - Pract.S.E.155
No ratings yet
2020-21 XIIInfo - Pract.S.E.155
11 pages
GPON ONT for Triple-Play Services
No ratings yet
GPON ONT for Triple-Play Services
2 pages
21stCenturyLit Week7&8
No ratings yet
21stCenturyLit Week7&8
4 pages
Deabark University: Group Member
No ratings yet
Deabark University: Group Member
12 pages
Ip pb1 QP Ms Agra Set A
No ratings yet
Ip pb1 QP Ms Agra Set A
17 pages
HD Video Capture Setup Guide
No ratings yet
HD Video Capture Setup Guide
49 pages
Fema - Champ. Manual PDF
No ratings yet
Fema - Champ. Manual PDF
49 pages
Webleaflet ENG Amiko Mira WiFi v170719
No ratings yet
Webleaflet ENG Amiko Mira WiFi v170719
2 pages
Linux PRIO Qdisc Traffic Control Guide
No ratings yet
Linux PRIO Qdisc Traffic Control Guide
26 pages
IT 111 SYLLABUS Orig
No ratings yet
IT 111 SYLLABUS Orig
8 pages
8242 Dect Handset Maintenance Manual: 8Al90310Usaaed02 July 2017
No ratings yet
8242 Dect Handset Maintenance Manual: 8Al90310Usaaed02 July 2017
19 pages
OMF000001 Um Interface and Radio Channels ISSUE2.1
No ratings yet
OMF000001 Um Interface and Radio Channels ISSUE2.1
44 pages
FX Server Release 14.x Catalog Page
No ratings yet
FX Server Release 14.x Catalog Page
10 pages
Dbvisit 7 Eleven Case Study
No ratings yet
Dbvisit 7 Eleven Case Study
2 pages
Information Security Transformation-Nahil Mahmood-Lecture 7
No ratings yet
Information Security Transformation-Nahil Mahmood-Lecture 7
5 pages
Imran Anwar SE
No ratings yet
Imran Anwar SE
2 pages
The Cool TEX Automation Tool: User Manual
No ratings yet
The Cool TEX Automation Tool: User Manual
23 pages

Set-D CT2 Answerkey

Uploaded by

Set-D CT2 Answerkey

Uploaded by

Register

SRM Institute of Science and Technology

Test: FT4 Date: 29-04-2025

Course Articulation Matrix:

S.No Question Marks BL CO PO PI Code

5 Which among the following is a common measure of dispersion of data 1 2 3 5 5.4.1

SRM Institute of Science and Technology

Test: FT4 Date:29-04-2025

• Tree is a hierarchical data structure where each node has a parent

Dataset A contains employee records with columns: EmpID, Name, Age,

Dataset B contains salary details with columns: EmpID, MonthlySalary.

Write Python code (using pandas) to:

(You may assume dummy data for illustration.)

2. Clean the Age column using suitable imputation

3. Clean the Name column by removing unnecessary spaces

4. Standardize capitalization of the Department column

5. Merge the two datasets on EmpID

6. Display the total salary aggregated by the Department

7. Display the result

13 Distinguish between Z-score normalization and Min-max normalization. 5 2 3 5 5.6.1

(You may assume dummy data (Qno:12) for illustration.)

# Plotting the 3D line

Mean/Median/Mode Imputation: Replace missing values with

Regression Imputation: Use regression models to predict

Multiple Imputation: Create multiple imputed datasets by filling

Choosing the right approach

The best approach for handling missing values depends on the

• Amount of missing data: If there are many missing values,

Give a simple example.

16 b You are given a Pandas DataFrame containing a column Customer_Info 10 3 3 5 5.5.2

" Mr. Ramesh K , Chennai - 600001 "

Perform the following tasks using Pandas string manipulation methods:

1. Strip leading and trailing whitespaces

2. Replace hyphens with space and normalize multiple

3. Extract Title, Name, City, and PIN using regex

5. pad PIN with zeros if less than 6 digits

print(df[['Title', 'Name', 'City', 'PIN']])

Statistical Graphics: Seaborn is specifically designed for

Data Visualization Themes: Seaborn offers pre-defined styles

Integration with Pandas and NumPy: Seaborn seamlessly

FacetGrid and Pair Plots: Seaborn provides FacetGrid for

Customization and Flexibility: While Seaborn provides a high-

Ease of Use: Seaborn's API is designed to be user-friendly and

FacetGrid: Group data by a categorical variable and plot

g = sns.FacetGrid(df, col="hue", height=4)

Jointplot: Visualize the relationship between two variables and

sns.jointplot(x='x', y='y', kind="scatter", data =data)

Pairplot: Visualize the relationships between all pairs of numeric

Product_ID Category Region Units_Sold Sale_Price

# Load the CSV file

# Set Seaborn style

#1. Histogram of Units_Sold

#2. Box plot of Sale_Price by Category

Course Outcome (CO) and Bloom’s level (BL) Coverage in Questions

You might also like