A
Set. No.
PERI INSTITUTE OF TECHNOLOGY (AUTONOMOUS)
Mannivakkam, Chennai-48.
B.E./B.Tech. DEGREE CAT -I EXAMINATION MARCH 2025
DEPARTMENT OF CIVIL/CSE/EEE/ECE/MECH/IT/AI&DS/CSBS/AIML/CYS
Regulations 2024
Semester:
Date:
Subject Code:
Time: Three Hours
Subject Name:
Maximum Marks: 100
Answer all Questions
PART-A (16 x 1=16 Marks)
(Multiple Choice)
PART-A(16*1=16Marks)
CO CL
What is the first stage in a typical data science project lifecycle?
A) Modeling
1 B) Data Cleaning CO1
C) Problem Definition
D) Deployment
Which of the following is a NoSQL database?
A) MySQL
2 B) PostgreSQL CO1
C) MongoDB
D) SQLite
In data science, which role is primarily responsible for deploying
machine learning models into production?
A) Data Analyst
3 B) Business Analyst
CO1
C) Machine Learning Engineer
D) Database Administrator
What is the use of the SQL JOIN operation?
A) Deletes duplicate records
4 B) Merges data from two or more tables CO1
C) Updates a table structure
D) Adds indexes to a table
Which format is commonly used to store structured data in tabular
form?
5 A) .txt CO1
B) .json
C) .csv
D) .mp4
What is the purpose of data sampling in data science?
A) To increase the size of the dataset
6 B) To clean the data CO1
C) To reduce data volume while maintaining representativeness
D) To convert data into NoSQL
Which of the following is NOT a phase in the data science process?
A) Data Collection
7 B) Data Compilation CO2
C) Data Modeling
D) Deployment
Sub Code : Page 1 of 5
What is data cleaning primarily used for?
A) To generate new data
8 B) To remove errors and inconsistencies CO2
C) To transform data into audio format
D) To visualize data
What is the main purpose of Exploratory Data Analysis (EDA)?
A) To deploy a model
9 B) To clean the data CO2
C) To understand patterns and relationships in the data
D) To encode variables
Which of the following is a visualization tool used in EDA?
A) Naive Bayes
10 B) Box Plot CO2
C) Principal Component
D) Linear Regression
Which Python library is widely used for data visualization in EDA?
A) NumPy
11 B) TensorFlow CO2
C) Seaborn
D) Scikit-learn
In a dataset, skewness indicates:
A) The number of rows
12 B) The symmetry of the distribution CO3
C) The variance of the mean
D) The number of unique values
Which transformation technique is used to normalize data between 0
and 1?
13 A) One-Hot Encoding CO3
B) Standardization
C) Min-Max Scaling
D) Log Transformation
What does reshaping data involve in EDA?
A) Cleaning dirty data
14 B) Changing the structure of the dataset CO3
C) Visualizing trends
D) Removing null values
Which of the following is an example of a classical statistical method?
A) Decision Tree
15 B) Hypothesis Testing CO3
C) Deep Learning
D) Clustering
In EDA, what is the purpose of a heatmap?
A) To clean the data
16 B) To show correlation between variables CO3
C) To calculate averages
D) To convert data into audio
PART- B (12*2=24 Marks)
CO CL
17 Define data science CO1
18 List any two roles in a data science project. CO1
19 Mention any two stages in a data science project lifecycle. CO1
20 What is the difference between structured and unstructured data? CO1
21 Name any two file formats commonly used to store data. CO2
Sub Code : Page 2 of 5
22 Write any two SQL commands used to interact with relational databases. CO2
23 What is data cleaning? Mention one technique. CO2
24 What is NoSQL? Give one example of a NoSQL database. CO2
25 What is the purpose of exploratory data analysis (EDA)? CO3
26 List any two visual tools used in EDA. CO3
27 Differentiate between EDA and classical statistical analysis. CO3
28 Mention any two Python libraries used for EDA. CO3
PART- C (6*10=60 Marks)
Mark CO CL
s
Explain the data science process in detail. Describe each stage with
(a) suitable examples.
CO1
29 [OR]
Discuss the different roles involved in a data science project. How do
(b) they collaborate during a project lifecycle?
CO1
Describe the process of working with data from files and relational
(a) databases. Illustrate with examples using Python or SQL.
CO1
[OR]
30 Explain the process of choosing and evaluating machine learning
models. Discuss how different models such as K-means, Naïve Bayes,
(b) and Linear Regression are selected based on the nature of the
CO1
problem. Also, elaborate on how these models are validated..
What are the common data cleaning techniques used in data
(a) science? Explain how sampling helps in model validation.
CO2
31 [OR]
Compare and contrast structured databases (relational) with NoSQL
(b) databases. Discuss the types and use cases of NoSQL.
CO2
Write a detailed note on data management practices in a data
(a) science workflow. Why is managing data efficiently important?
CO2
32 [OR]
Define Exploratory Data Analysis (EDA). Explain its significance and
(b) how it differs from classical and Bayesian analysis.
CO2
Discuss various data visualization techniques used in EDA. Support
(a) your answer with examples and plots.
CO3
33 [OR]
Describe the steps involved in performing EDA using Python. Mention
(b) important libraries and functions.
CO3
Explain various data transformation techniques used in EDA, such as
(a) merging, reshaping, pivoting, and encoding. Provide code examples if CO3
possible.
[OR]
34
Discuss in detail the various model evaluation methods used for
clustering models. Explain the working of the K-means algorithm and
(b) compare it with other unsupervised methods. Also, discuss the CO3
differences between memorization methods and generalization
methods in machine learning..
Sub Code : Page 3 of 5
Instruction for Question paper preparation:
All the questions to be in Book Antiqua with font size 12.
Line spacing: 1.15
Keep the images at the center and use high quality line diagrams only.
CL – Cognitive level as per revised Blooms taxonomy.
R-Remember, U-Understand, Ap-Apply, An-Analyze, E-Evaluate, C-Create
CO – Course Outcome.
Sub Code : Page 4 of 5
Sub Code : Page 5 of 5