Question Bank

Uploaded by

22501a05c6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views6 pages

Question Bank

Uploaded by

22501a05c6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

P.V.

P SIDDHARTHA INSTITUTE OF TECHNOLOGY

BRA REG
NCH Computer Science & Engineering ULAT PVP20
: ION :
Cour DATA
Cou B.Tech se SCIEN
rse: Name CE
:
Cou Year
rse 20CS4501A and III-I
Cod Seme
e: ster:
QUESTION BANK
UNIT - I
Q. QUESTION CO LEVEL MARKS
NO.
1 Explain the phases of Data Science. 1 2 14
2 What is Exploratory Data Analysis? Explain any two
types of visualization. 1 2 14
3 Explain various Hyper parameter optimization techniques
with suitable examples. 1 2 14
4 Briefly explain the role of Data Science in various fields. 1 2 14
5 Explain Data Science phases and lifecycle and write the 1 2 14
names of tools for Data Science.
6 Explain about roles and stages in data science project. 1 2 14
7 Explain about exploring and managing data in data 1 2 14
science.
8 Explain the various processes for preparing a dataset to 1 2 14
perform a data science task.
9 Define Hyperparameter Optimization and discuss various 1 2 14
strategies for optimizing hyperparameter methods.

UNIT-II
Q. QUESTION CO LEVEL MARKS
NO.
Suppose that the data for analysis includes the attribute
1 age. The age values for the data tuples are (in increasing 2 3 14
order):
13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72
(i) Use min-max normalization to transform the value of
45 for age onto the range [0,1]
(ii) Use Z-Score normalization to transform the value 45
for age where the standard deviation of age is 20.64
years.
a) Differentiate between data reduction and
2 dimensionality reduction for data discretization. 2 2 14
b) Explain the role of attributes in classification.
Normalize the following group of data by using the
following techniques.
200, 300, 400, 600, 1000
3 i. min-max normalization technique 2 3 14
ii. z-score normalization
iii. Decimal scaling.
c) Write your observations on the above techniques.

4 How to implement Data Transformation and Data 2 2 14

Discretization? Explain with examples.
5 Given data = {2, 3, 4, 5, 6, 7; 1, 5, 3, 6, 7, 8}. Use PCA 2 3 14
Algorithm to compute the principal component.
In real-world data, tuples with missing values for some
6 attributes are a common occurrence. Apply various 2 3 14
pre-processing methods for handling this problem.
The following are the sorted data price (in rupees) of
certain items in the supermarket.
7 4, 8, 15, 21, 21, 24, 25, 28, 34, 36, 39, 42, 51, 57, 60 2 3 14
Smooth the data by using the following smoothing
techniques. Consider the bin size as 3. i) Bin means ii)
Bin medians iii) Bin boundaries
a) Why data transformation is important and when do
we need it.
8 b) When do we use splitting table technique? 2 2 14
c) How does adding redundant column can become
cause for loss of information? Justify?
Evaluate any two data reduction techniques with
9 examples. What is the format for reporting results of 2 3 14
each?

UNIT-III
Q. QUESTION CO LEVEL MARKS
NO.
1 Explain the Mean and Variance of Binomial Distribution 3 2 14
and its properties.
There is a class of 25 students, and the mean score of their
test is 60 out of 100, with standard deviation 4 marks from
2 the mean. While other students of the school have a mean 3 3 14
score of 50 on the same test. What will be the t-score for
calculating the probability that school students scored not
less than 60 on their tests?
Find the mean and standard deviation of a normal
3 distribution in which 7% are under 35, and 89% are under 3 3 14
63.
In a Normal Distribution 31% of the items are under 45 and
4 8% are over 64. Find the Mean and Variance of the 3 3 14
Distribution.
From the following data, find whether there is any
significant linking in the habit of taking soft drinks among
categories of employees by using chi-square test.
Soft drinks Employees
5 Clerks Teachers Officers 3 3 14
Pepsi 10 25 65
Thumps up 15 30 65
Fanta 50 60 30

6 Explain about various methods of Data Collection involved 3 2 14

in Data Science.
In a sample of 1000 cases, the mean of a particular test is
14, and the standard deviation is 2.5. assuming the
distribution to be normal, Determine
7 i) how many students score between 12 and 15? 3 3 14
ii) how many scores above 18?
iii)how many scores below 18?

8 Differentiate Stratified sampling and Cluster sampling 3 2 14

techniques with examples.
A pair of dice is thrown 360 times and frequencies of each
sum are indicated below Would you say that the dice are fair
on the basis of the Chi-Square test at 0.01 Level of
9 Significance. 3 3 14
Sum 2 3 4 5 6 7 8 9 10 11 12
Frequency 8 24 35 37 44 65 51 42 26 14 14
4
UNIT-IV
Q. M
NO QUESTION CO LEVEL
.
If the logit score (linear predictor) is given by –2.4 + 1.5 X1 + 2 X2, find the
estimated P(Y = 1) for each of the following combination of the IDVs:
1 X1: 0 1.5 2 3 -2 -2.5 4 3
X2: 1 0 1.5 -1 2 2.5
2 Which specific regressors seem essential in multiple regression? How will 4
you address this question? Discuss. 2
Suppose you have the following data with one real-value input variable &
one real-value output variable. What is leave-one out cross validation
mean square error in case of linear regression (Y = bX+c)?
3 X (independent variable) Y (dependent variable) 4 3
0 2
2 2
3 1
a) Discuss the need for fitting the model in multiple regression.
4 b) Explain Logistic Regression with an example. 4 2
a) Obtain the likelihood equation for estimating the parameters of a
logistic regression model.
5 b) Define multiple linear regression model. Explain the least squares 4 2
method to estimate parameters in multiple linear regression models.

6 Explain Linear Discriminative analysis in detail with an example. 4 2

Apply linear regression using the method of least squares to the following
data and predict the crop yield for rain fall of 5 cm.
Rain
fall(in 10.5 8.8 13.4 12.5 18.8 10.3 7.0 15.6 16
cms)
7 Paddy
4 3
yield
(quint 30.3 46.2 58.8 59.0 82.4 49.2 31.9 76.0 78.8
al per
acre)

8 Explain how can over fitting and under fitting issues are handled in 4
Regression modeling. 2
Build a linear regression model with the following data and test for overall
fit. Also, test for the individual significance of X1 and of X2.
Y: 12.8 13.9 15.2 18.3 14.5 12.4
9 4 3
X1: 2 3 5 5 4 1
X2: 4 2 5 1 2 3
10 Apply logistic regression to demonstrate binary classification. 4 3
UNIT - V
Q.
N QUESTION CO LEVE
L MAR
O.
1 Explain the bias/ variance dilemma about the model complexity. 1 2 14
2 Explain k-fold cross validation and how it can be implemented for
building a model. 4 3 14
Imagine that you find out that your model has low bias and high
3 variance. Which algorithm would be best suited to this problem? What's 4 3 14
the reason?
a) Develop the estimate of in-sample error derivation.
4 b) Explain Minimum description length principle for model building 1 2 14
Briefly explain how you would calculate a cross-validated estimate of
5 prediction error in a 1 2 14
Linear regression. Is this estimate likely more minor or more significant
than the in-sample error?
6 What is the holdout approach? What is the limitation of this approach? 1 2 14
Name four alternative approaches for it.
7 Explain bias and variance in machine learning and how bias-variance 1 2 14
decomposition is used for deciding the model complexity?
Given data set STr = {(xi, yi), i=1,….6}, xi∈ℝ a feature scalar yi∈{-1,+1} a
class label.
Data points in the data set are
(x1,y1)=(2,-1) (x2,y2)=(7,-1)
(x3,y3)=(4,+1) (x4,y4)=(1,-1)
(x5,y5)=(3,+1) (x6,y6)=(6,+1)
Suppose you are training a Linear Classifier
8 f(x;a,b) = sign(ax+b) with 2-fold Cross Validation where sign(z) = 4 3 14
{+ 1, 𝑧≥0 − 1, 𝑧 < 0
Split STr into S1={(x1,y1) (x2,y2) (x3,y3)} and
S2={(x4,y4) (x5,y5) (x6,y6)}
After training the classifier f on S1, we have a1=-1, b1=5 and then try to
validate the classifier on S2.
After training the classifier f on S2, we have a2=2, b2=-3 and then try to
validate the classifier on S1
Calculate the average training error in the 2-fold cross-validation.
Given data set TTr = {(xi, yi), i=1,….4}, xi∈ℝ a data point yi∈{-1,+1} a
corresponding label.
The Data points are
(x1,y1)=(5,-1) (x2,y2)=(6,-1)
9 (x3,y3)=(1,+1) (x4,y4)=(4,-1) 4 3 14
Suppose you are training a Linear Classifier
f(x;a,b) = sign(bx+a) with 2-fold Cross Validation where sign(z) =
{+ 1, 𝑧≥0 − 1, 𝑧 < 0
Split STr into T1={(x1,y1) (x2,y2)} and
T2={(x3,y3) (x4,y4)}
Subsequently training the classifier f(x;a,b) on T1, we have a1=-3, b1=4
and then try to validate the classifier on T2.
Training the classifier f(x;a,b) on T2, we have a2=1, b2=-5 and then try to
validate the classifier on T1
Calculate the average validation error (i.e. the cross-validation
error) in the 2-fold cross-validation.

Bucket Bag
100% (1)
Bucket Bag
8 pages
Ad 5 Case Study
No ratings yet
Ad 5 Case Study
11 pages
B.Com Management Exam Prep Guide
100% (1)
B.Com Management Exam Prep Guide
7 pages
ESC-CSBS601 PEC-IT602D Pattern Recognition
100% (1)
ESC-CSBS601 PEC-IT602D Pattern Recognition
2 pages
Computational Statistics - 3rd Sem-1
No ratings yet
Computational Statistics - 3rd Sem-1
4 pages
SMAI Question Papers
No ratings yet
SMAI Question Papers
13 pages
Ad3491 Foda Question Bank
No ratings yet
Ad3491 Foda Question Bank
7 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
Machine Learning Quiz for Students
No ratings yet
Machine Learning Quiz for Students
45 pages
Gonna Fly Now Trumpet Cover
No ratings yet
Gonna Fly Now Trumpet Cover
3 pages
AC Service Unit: Repair Instructions
100% (1)
AC Service Unit: Repair Instructions
29 pages
Question 1) Briefly Explain Capital Allocation Process With The Help of Diagram?
No ratings yet
Question 1) Briefly Explain Capital Allocation Process With The Help of Diagram?
7 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
IBM322 Last Year ETE
No ratings yet
IBM322 Last Year ETE
5 pages
Introds Final 2024 Incl Sol
No ratings yet
Introds Final 2024 Incl Sol
10 pages
Machine Learning Assignment Solutions
No ratings yet
Machine Learning Assignment Solutions
46 pages
Activities Super
No ratings yet
Activities Super
6 pages
Homework Set 3
No ratings yet
Homework Set 3
7 pages
5 Behavioral Biases That Trip Up Remote Managers
No ratings yet
5 Behavioral Biases That Trip Up Remote Managers
7 pages
FDSA SEM Answer Key
No ratings yet
FDSA SEM Answer Key
11 pages
ML June 2024
No ratings yet
ML June 2024
12 pages
Ssmda Pyq
No ratings yet
Ssmda Pyq
16 pages
ML Assignments 2025
No ratings yet
ML Assignments 2025
91 pages
Question 1
No ratings yet
Question 1
23 pages
S&UL Subjective Question Bank
No ratings yet
S&UL Subjective Question Bank
7 pages
Katalog GALA - Gate Valve OSNY
No ratings yet
Katalog GALA - Gate Valve OSNY
1 page
ML Questions
No ratings yet
ML Questions
6 pages
Machine Learning 20CSE09
No ratings yet
Machine Learning 20CSE09
3 pages
ERERER
No ratings yet
ERERER
1 page
ETREP
No ratings yet
ETREP
20 pages
2-2 Cse-A, B, CSM Bit Bank Mid - 1
No ratings yet
2-2 Cse-A, B, CSM Bit Bank Mid - 1
7 pages
CS-30004 (Dsa) - CS End Nov 2024
No ratings yet
CS-30004 (Dsa) - CS End Nov 2024
17 pages
Spring Mid Sem ML Evalution Scheme
No ratings yet
Spring Mid Sem ML Evalution Scheme
8 pages
Pratice Paper
No ratings yet
Pratice Paper
12 pages
2CSOE03 IR December 2022
No ratings yet
2CSOE03 IR December 2022
4 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
RAN Network Optimization Parameter Reference RAN6 1
No ratings yet
RAN Network Optimization Parameter Reference RAN6 1
371 pages
Compre FoDS
No ratings yet
Compre FoDS
2 pages
Selection Errors MBA
No ratings yet
Selection Errors MBA
3 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Compre FoDS
No ratings yet
Compre FoDS
3 pages
HW 02
No ratings yet
HW 02
3 pages
Artificial Intelligence & BA - Practicals Assignments
No ratings yet
Artificial Intelligence & BA - Practicals Assignments
15 pages
20CS4501A
No ratings yet
20CS4501A
3 pages
Statistical Reasoning - Question Bank
No ratings yet
Statistical Reasoning - Question Bank
7 pages
PSLP Notes
No ratings yet
PSLP Notes
13 pages
CFA LEVEL 1 - CFA Exam Core Video Series
No ratings yet
CFA LEVEL 1 - CFA Exam Core Video Series
2 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
ESA - QP - UE19-20CS203 - SDS - Scheme and Solution
No ratings yet
ESA - QP - UE19-20CS203 - SDS - Scheme and Solution
12 pages
Data Science Exam for BITS Pilani
No ratings yet
Data Science Exam for BITS Pilani
2 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
HIST342 Exercise 10
No ratings yet
HIST342 Exercise 10
5 pages
Worksheet For Quiz
No ratings yet
Worksheet For Quiz
5 pages
Epfl Machine Learning Final Exam 2021 Solutions
No ratings yet
Epfl Machine Learning Final Exam 2021 Solutions
21 pages
DS Assignment No 2
No ratings yet
DS Assignment No 2
21 pages
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
No ratings yet
MS4610 - Introduction To Data Analytics Final Exam Date: November 24, 2021, Duration: 1 Hour, Max Marks: 75
11 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
HRM: Job Analysis Essentials
100% (1)
HRM: Job Analysis Essentials
11 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Itae002 Test 2
No ratings yet
Itae002 Test 2
150 pages
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
No ratings yet
Machine Learning Foundations and Applications Assignment 1 Due Date: 10 October, 2021
3 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
ITAE002
0% (1)
ITAE002
10 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
No ratings yet
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
7 pages
PRML 2022 Endsem
No ratings yet
PRML 2022 Endsem
3 pages
Corporate Banking Analysis Guide
No ratings yet
Corporate Banking Analysis Guide
38 pages
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
No ratings yet
Final Exam, Data Mining (CEN 871) : Name Surname: Student's ID
2 pages
Ethics Case Studies
No ratings yet
Ethics Case Studies
5 pages
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
No ratings yet
ECE457 Pattern Recognition Techniques and Algorithms: Answer All Questions
3 pages
The Critical Succesfactor of The Client Consultant Relationship
No ratings yet
The Critical Succesfactor of The Client Consultant Relationship
26 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
Daily Report October 2013 Yde (4) Rtttfrre
No ratings yet
Daily Report October 2013 Yde (4) Rtttfrre
112 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Ambassador SWOT Examples
No ratings yet
Ambassador SWOT Examples
18 pages
PDF Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author) Download
100% (3)
PDF Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author) Download
53 pages
Hydraulic Sealing Surface Insights
No ratings yet
Hydraulic Sealing Surface Insights
7 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
### The Opium Trade Between The British East India Company and China
No ratings yet
### The Opium Trade Between The British East India Company and China
2 pages
SIDF Corporate Profile 2022
No ratings yet
SIDF Corporate Profile 2022
63 pages
3D Passwords: Advanced Authentication
No ratings yet
3D Passwords: Advanced Authentication
16 pages
Pushover-Based Risk Assessment Method:: A Practical Tool For Risk Assessment of Building Structures
No ratings yet
Pushover-Based Risk Assessment Method:: A Practical Tool For Risk Assessment of Building Structures
14 pages
Casemine Judgments 12
No ratings yet
Casemine Judgments 12
8 pages
How to Stop Sending Money to Girls
No ratings yet
How to Stop Sending Money to Girls
1 page
Libreoffiice Basic: Libreoffic E Referen E Card
No ratings yet
Libreoffiice Basic: Libreoffic E Referen E Card
2 pages

Question Bank

Uploaded by

Question Bank

Uploaded by

P.V.

P SIDDHARTHA INSTITUTE OF TECHNOLOGY

4 How to implement Data Transformation and Data 2 2 14

6 Explain about various methods of Data Collection involved 3 2 14

8 Differentiate Stratified sampling and Cluster sampling 3 2 14

6 Explain Linear Discriminative analysis in detail with an example. 4 2

You might also like