Multiple Choice Questions on Model Development and Evaluation
1. Which of the following best describes classification in machine learning?
A) Predicting continuous outcomes based on input features
B) Predicting categorical labels based on input features
C) Grouping similar data points into clusters
D) Assessing model performance using metrics
2. In binary classification, which of the following is an example of a predictive task?
A) Classifying emails as spam or not spam
B) Predicting house prices based on square footage
C) Segmenting customers based on purchase frequency
D) Recognizing handwritten digits
3. What is the primary function of Logistic Regression?
A) To predict multiple classes simultaneously
B) To predict the probability of a binary outcome
C) To create a hierarchy of clusters
D) To minimize variance within clusters
4. Which of the following algorithms is NOT typically used for classification tasks?
A) Decision Trees
B) K-Means Clustering
C) Random Forest
D) Support Vector Machines
5. In multiple linear regression, what does the model aim to predict?
A) A single independent variable
B) A continuous output based on multiple predictors
C) Categorical labels from input features
D) The variance within clusters
6. What is the main goal of clustering in machine learning?
A) To predict future outcomes based on historical data
B) To group similar data points without prior labels
C) To assess model performance using metrics
D) To determine the reasons behind past outcomes
7. Which clustering algorithm is characterized by its ability to identify noise points?
A) K-Means Clustering
B) Hierarchical Clustering
C) DBSCAN
D) Random Forest
8. Underfitting occurs when a model:
A) Is too complex and learns noise in the training data
B) Is too simple to capture the underlying patterns in the data
C) Performs well on unseen data but poorly on training data
D) Uses too many parameters for training
9. What does cross-validation help assess in a machine learning model?
A) The complexity of the model architecture
B) The generalization capability of the model on unseen data
C) The accuracy of predictions made during training
D) The number of features used in the model
10. The formula for calculating precision is:
A) True Positives / (True Positives + False Negatives)
B) True Positives / (True Positives + False Positives)
C) True Positives + False Positives / Total Predictions
D) True Positives + False Negatives / Total Actual Positives
11. Which metric provides a balance between precision and recall?
A) Accuracy
B) F1-Score
C) ROC-AUC
D) Mean Absolute Error
12. The area under the ROC curve (AUC):
A) Ranges from -1 to 1
B) Indicates perfect separation at 0.5
C) Ranges from 0 to 1, where 1 indicates perfect separation
D) Is only applicable to regression models
13. Mean Squared Error (MSE):
A) Is calculated as the average of absolute differences between predicted and actual values
B) Is calculated as the average of squared differences between predicted and actual values
C) Provides an error measure in the same units as the target variable
D) Cannot be used for regression evaluation
14. In predictive analytics, what is primarily forecasted?
A) Historical trends and patterns
B) Reasons behind past outcomes
C) Future events based on historical data
D) Actions to achieve desired outcomes
15. What is a key step in model deployment?
A) Regularly retraining the model with new data
B) Creating an API for external applications to interact with the model
C) Assessing model performance using accuracy metrics
D) Documenting the development process
16. Monitoring a deployed model involves:
A) Regularly updating software libraries
B) Assessing performance metrics like accuracy and precision
C) Preparing the model for deployment
D) All of the above
17. Which operational metric measures how many predictions are made per unit time?
A) Latency
B) Throughput
C) Drift
D) Accuracy
18. Regular retraining of a model is essential because:
A) It reduces computational costs
B) It ensures adaptation to changes in data patterns
C) It simplifies the model architecture
D) It eliminates noise from training data
19. Which type of analytics focuses on recommending actions to achieve desired outcomes?
A) Descriptive Analytics
B) Diagnostic Analytics
C) Predictive Analytics
D) Prescriptive Analytics
20. What does drift refer to in model monitoring?
A) Changes in software dependencies
B) Changes in input data relationships leading to potential degradation
C) Variations in prediction latency over time
D) Fluctuations in accuracy metrics
21. Which regression metric provides an error measure in the same units as the target variable?
A) Mean Absolute Error (MAE)
B) Mean Squared Error (MSE)
C) Root Mean Squared Error (RMSE)
D) R-squared
22. In hierarchical clustering, what is primarily built?
A) An ensemble of decision trees
B) A hierarchy of clusters
C) Linear equations based on predictors
D) Predictions for categorical labels
23. What is a common framework for creating APIs for machine learning models?
A) TensorFlow
B) Flask
C) Scikit-learn
D) Keras
24. Which type of regression involves one independent variable only?
A) Simple Linear Regression
B) Multiple Linear Regression
C) Polynomial Regression
D) Logistic Regression
25. What does K-Fold Cross-Validation help prevent?
A) Overfitting
B) Underfitting
C) Noise in training data
D) Model complexity
26. The harmonic mean used in F1-Score calculation helps balance which two metrics?
A) Accuracy and Recall
B) Precision and Recall
C) Precision and F1-Score
D) Recall and ROC-AUC
27. Which type of analytics seeks to understand past data trends?
A) Descriptive Analytics
B) Predictive Analytics
C) Diagnostic Analytics
D) Prescriptive Analytics
28. The primary cause of overfitting is:
A) Too few parameters in the model
B) Not enough training time
C) Too many parameters relative to training samples
D) Insufficient feature selection
29. Which algorithm is particularly useful for customer segmentation based on features like
purchase frequency?
A) Logistic Regression
B) K-Means Clustering
C) Decision Trees
D) Random Forest
30. What should be included in comprehensive documentation during model maintenance?
A) Only code snippets used for training
B) Results from previous experiments only
C) Development, deployment, and update processes
D) User feedback post-deployment