Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views18 pages

ML

The document covers various machine learning concepts including supervised vs unsupervised learning, overfitting, dimensionality, cost functions, and various algorithms like decision trees, SVM, and K-means clustering. It also discusses reinforcement learning, regularization techniques, and the role of activation functions in neural networks, along with practical applications in fields like healthcare and disaster management. Additionally, it provides insights into model evaluation using confusion matrices and compares different learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

ML

The document covers various machine learning concepts including supervised vs unsupervised learning, overfitting, dimensionality, cost functions, and various algorithms like decision trees, SVM, and K-means clustering. It also discusses reinforcement learning, regularization techniques, and the role of activation functions in neural networks, along with practical applications in fields like healthcare and disaster management. Additionally, it provides insights into model evaluation using confusion matrices and compares different learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

1.

Supervised vs Unsupervised Learning

Aspect Supervised Learning Unsupervised Learning

Learns from labeled data (input-output


Definition Learns from unlabeled data
pairs)

Discover hidden patterns or


Objective Predict outcomes or classify data
groupings

Example K-Means, PCA, Hierarchical


Linear Regression, SVM, Decision Trees
Algorithms Clustering

Customer segmentation in
Example Use Case Email spam detection (spam/not spam)
marketing

2. Overfitting and Prevention Techniques

Overfitting occurs when a model learns the noise in the training data instead of the actual patterns,
resulting in poor generalization to new data.

Prevention Techniques:

1. Regularization: Adds a penalty to the loss function to constrain model complexity (e.g.,
L1/Lasso, L2/Ridge regularization).

2. Cross-validation: Helps ensure the model performs well on unseen data by splitting the data
into training and validation sets.

3. Curse of Dimensionality

Definition: As the number of features (dimensions) increases, the volume of the feature space grows
exponentially, making the data sparse and distance measures less meaningful.

Effect on Models:

• Increases computational cost

• Reduces model performance due to overfitting and sparse data

• Makes training harder and generalization poorer

Solution: Dimensionality reduction (e.g., PCA), feature selection.

4. Cost Function for Linear Regression and Gradient Descent

Cost Function (Mean Squared Error - MSE):


5. Decision Trees vs Random Forests

Aspect Decision Tree Random Forest

Overfitting Prone to overfitting Less prone due to averaging over multiple trees

Accuracy Moderate Generally higher due to ensemble approach

Interpretability High (easy to visualize) Low (multiple trees are hard to interpret)

Speed Fast training and prediction Slower training due to many trees

6. SVM Classifier (with Diagram)

Concept: SVM finds the hyperplane that best separates the classes with the maximum margin.

Key Points:

• Works well in high-dimensional space

• Supports kernels (e.g., linear, RBF) for non-linear classification


Diagram:

Class A (o), Class B (x)

o | x

| x

--------|-------- <- Optimal Hyperplane

| x

o | x

Support vectors are closest points to the hyperplane on either side.

7. Cross-Validation

Definition: Technique to assess how the model will generalize to an independent dataset.

K-Fold Cross-Validation:

• Data is divided into K subsets.

• Model is trained on K-1 subsets and tested on the remaining.

• Repeated K times; average performance is reported.

Importance:

• Reduces risk of overfitting

• Provides a better estimate of model performance

8. Gradient Descent Variants

Type Description Pros Cons

Batch Gradient Uses the entire dataset to Slow for large


Stable convergence
Descent compute gradients datasets
Type Description Pros Cons

Stochastic GD Fast updates, handles Noisy updates, may


Uses one sample at a time
(SGD) big data overshoot

Uses a small batch of data (e.g., Combines speed and Requires batch size
Mini-Batch GD
32, 64 samples) stability tuning

9. Explain the working of K-means clustering algorithm with an example

K-means is an unsupervised learning algorithm used for clustering data into K distinct
groups based on feature similarity.

Steps:

1. Choose the number of clusters K.


2. Initialize K centroids randomly.
3. Assign each point to the nearest centroid (forming clusters).
4. Update the centroids as the mean of the points in each cluster.
5. Repeat steps 3–4 until centroids stop changing (convergence).

Example:

Given points: A(1,2), B(1,4), C(5,7), D(6,8)


If K=2:

• Initial centroids: A, C
• Cluster 1: A, B → new centroid = (1,3)
• Cluster 2: C, D → new centroid = (5.5, 7.5)

Repeat until centroids stabilize.

10. What is the role of activation functions in neural networks? Compare


ReLU, Sigmoid, and Tanh functions

Role:
Activation functions introduce non-linearity into the network, allowing it to learn complex
patterns.

Comparison:
Function Equation Range Pros Cons
𝜎(𝑥) = 11 + 𝑒 − 𝑥\𝑠𝑖𝑔𝑚𝑎(𝑥) Smooth,
Vanishing
𝑺𝒊𝒈𝒎𝒐𝒊𝒅 = \𝑓𝑟𝑎𝑐{1}{1 (0, 1) probabilistic
gradient, slow
+ 𝑒^{−𝑥}} output
𝑡𝑎𝑛ℎ (𝑥) = 𝑒𝑥 − 𝑒 − 𝑥𝑒𝑥 + 𝑒
− 𝑥\𝑡𝑎𝑛ℎ(𝑥)
(−1, Zero-centered Still vanishes
𝑻𝒂𝒏𝒉 = \𝑓𝑟𝑎𝑐{𝑒^𝑥
1) output gradient
− 𝑒^{−𝑥}}{𝑒^𝑥
+ 𝑒^{−𝑥}}
Fast Can die
𝑹𝒆𝑳𝑼 𝑓(𝑥) = 𝑚𝑎𝑥 (0, 𝑥)𝑓(𝑥) = \𝑚𝑎𝑥(0, 𝑥) [0, ∞) convergence, (neurons stuck
sparsity at 0)

11. How does a Naïve Bayes classifier work? What are its advantages and
limitations?

Working:

• Based on Bayes' Theorem:

𝑃(𝐶 ∣ 𝑋) = 𝑃(𝑋 ∣ 𝐶)𝑃(𝐶)𝑃(𝑋)𝑃(𝐶|𝑋) = \𝑓𝑟𝑎𝑐{𝑃(𝑋|𝐶)𝑃(𝐶)}{𝑃(𝑋)}

• Assumes feature independence given the class.


• Selects the class with the highest posterior probability.

Advantages:

• Fast and simple


• Works well with high-dimensional data (e.g., text classification)

Limitations:

• Assumes independence among features (often unrealistic)


• Not ideal for correlated features

12. Discuss the significance of reinforcement learning and its real-world


applications

Reinforcement Learning (RL) is a type of learning where an agent learns by interacting


with an environment to maximize a reward signal.

Significance:

• Models sequential decision-making


• Learns optimal policies without labeled data
Real-world applications:

• Game playing (e.g., AlphaGo)


• Robotics (e.g., robotic arm control)
• Recommendation systems (dynamic content personalization)
• Autonomous vehicles (decision-making)

13. Explain the concept of overfitting and underfitting in machine learning.


How can they be prevented?

Concept Description Prevention Techniques


Model learns noise; high training Regularization, cross-validation,
Overfitting
accuracy, poor generalization pruning, dropout
Model is too simple; fails to learn Use complex models, more
Underfitting
underlying patterns features, reduce bias

14. Describe the k-Nearest Neighbors (k-NN) algorithm and its working with
an example

k-NN is a non-parametric algorithm used for classification and regression.

Working:

1. Choose k.
2. Compute the distance (e.g., Euclidean) from the query point to all data points.
3. Select the k nearest neighbors.
4. Majority vote (classification) or average (regression) to make prediction.

Example:

To classify a new point, if its 3 nearest neighbors are {dog, dog, cat}, k-NN predicts dog.

15. What is the role of a confusion matrix in model evaluation? Explain its
components

A confusion matrix is a performance metric for classification tasks showing actual vs


predicted values.

Structure (for binary classification):

Predicted Positive Predicted Negative


Actual Positive True Positive (TP) False Negative (FN)
Actual Negative False Positive (FP) True Negative (TN)
Key Metrics:

• F1-score = Harmonic mean of precision and recall

16. Compare and contrast Reinforcement Learning and Supervised Learning


with examples

Aspect Supervised Learning Reinforcement Learning


No labeled data, reward signal guides
Data Labeled data (input-output pairs)
learning
Minimize error between prediction and
Objective Maximize cumulative reward
truth
Feedback Instant and direct Delayed and indirect
Example Email classification Robot navigating a maze

17. Describe how Genetic Algorithms work in machine learning. Provide an


example use case

Genetic Algorithms (GAs) are optimization techniques inspired by natural selection.

Working Steps:

1. Initialization: Generate random population (solutions).


2. Selection: Choose best-performing individuals.
3. Crossover: Combine parts of two individuals.
4. Mutation: Randomly alter some genes.
5. Repeat until convergence.

Example Use Case:

• Feature selection: GA can optimize which subset of features yields best model
accuracy.

Machine Learning Questions and Answers (Continued)


18. What is the difference between Hard Margin and Soft Margin in Support
Vector Machines (SVMs)?

• Hard Margin SVM:


o Assumes data is linearly separable.
o Finds a hyperplane that perfectly separates the classes with maximum
margin.
o No tolerance for misclassification.
o May fail if data is not perfectly separable.
• Soft Margin SVM:
o Allows some misclassifications using a penalty parameter CC.
o Balances between maximizing margin and minimizing classification error.
o More robust in the presence of noise or overlapping classes.

19. Explain the structure of an Artificial Neural Network (ANN) with a


suitable diagram.

• Structure:
o Input Layer: Accepts the feature set.
o Hidden Layers: Perform transformations using weights, biases, and activation
functions.
o Output Layer: Produces the prediction (classification or regression).

Diagram:

Input Layer → Hidden Layer(s) → Output Layer


o o
o → o → o
o o

20. Describe the concept of backpropagation and how it is used to train neural
networks.

Backpropagation is the algorithm used to train neural networks by updating weights to


minimize the error.

• Steps:
1. Forward pass: Compute output using current weights.
2. Loss computation: Calculate error using a loss function.
3. Backward pass: Compute gradients of the loss with respect to each weight.
4. Update weights: Use gradient descent to adjust weights.

21. Discuss the role of regularization in machine learning. Explain L1 (Lasso)


and L2 (Ridge) regularization with their impact on model performance.
Regularization prevents overfitting by adding a penalty to the loss θ:=θ−α⋅∂Loss∂θ\theta :=
\theta - \alpha \cdot \frac{\partial \text{Loss}}{\partial \theta}function.

• L1 (Lasso) Regularization:

o Can shrink some weights to zero (feature selection).


o Leads to sparse models.
• L2 (Ridge) Regularization:

o Penalizes large weights without eliminating them.


o Promotes weight smoothing.

22. Explain the K-means clustering algorithm. How do you determine the
optimal number of clusters?

• K-means Clustering:
o Clusters data into K groups based on feature similarity.
o Uses distance measures to assign points and update centroids.
• Determining K:
o Use the Elbow Method: Plot the within-cluster sum of squares (WCSS) for
various values of K and choose the point where the curve elbows.
o Silhouette Score: Measures how similar a point is to its cluster compared to
other clusters.

23. What are the advantages of Hierarchical Clustering over K-means


clustering? Explain Agglomerative Hierarchical Clustering with an example.

Advantages of Hierarchical Clustering:

• No need to predefine number of clusters.


• Dendrogram gives a clear visual representation.
• Can capture nested clusters.

Agglomerative Clustering:

• Bottom-up approach.
• Each point starts as its own cluster.
• Iteratively merge the closest pair of clusters until one cluster remains.

Example:

• Given points: A, B, C, D
• Initially: {A}, {B}, {C}, {D}
• Merge closest: {A, B}, {C, D}
• Continue until one cluster remains.

24. What is Backpropagation? Derive the weight update equation in a simple


neural network using gradient descent. Compare CNN and RNN.

• Backpropagation:
o Computes the gradient of the loss function w.r.t weights.
o Uses chain rule to propagate errors from output to input layers.
• Weight Update Equation:

Comparison: CNN vs RNN

Feature CNN RNN


Used For Image, spatial data Sequential data, time series
Memory No memory of past input Remembers previous input
Architecture Convolutional + pooling layers Loops through time with hidden state
Example Image classification Text generation, language modeling

25. Explain the structure and working of an Artificial Neural Network (ANN)
with a suitable diagram. How is ML used in Disaster Management System?
Discuss with case study.

• ANN Structure:
o Same as explained in Q19.

ML in Disaster Management:

• Use cases:
o Predict floods, earthquakes, forest fires using satellite data and historical
patterns.
o Optimizing resource allocation during disasters.

Case Study:

• Google AI for Flood Forecasting:


o Uses ML + satellite imagery to predict river water levels and warn populations
in flood-prone areas (India, Bangladesh).
o Achieved better accuracy and earlier alerts than traditional systems.

26. What are the key areas where we engage Machine Learning as a tool to get
advantages in our daily life? How is ML used in healthcare for disease
prediction? Discuss with case studies.

Key Areas:

• Voice assistants (e.g., Alexa, Siri)


• Recommendation systems (Netflix, YouTube)
• Smart home devices
• Spam filtering
• Language translation

ML in Healthcare:

• Disease prediction using medical history, lab results, imaging.

Case Study 1:

• IBM Watson: Analyzes medical records and literature to assist in cancer treatment
recommendations.

Case Study 2:

• Diabetes Prediction using logistic regression, decision trees, and neural networks
trained on patient data to forecast onset.

Here is the content formatted for Microsoft Word, continuing from the previous questions. You can
copy and paste this directly, or I can generate a .docx file for download if needed.

Machine Learning Questions and Answers (Continued)


27. Explain the working of the Linear Regression model and derive its cost function using Mean
Squared Error (MSE).

28. Explain the advantages and disadvantages of Naïve Bayes Classifier.

Advantages:

• Simple, fast, and efficient.

• Performs well on text classification problems (e.g., spam detection).

• Works well with high-dimensional data.

Disadvantages:

• Assumes independence among features, which is rarely true.

• Poor estimates for correlated features.

• Zero-frequency problem: if a category is missing in training data, it leads to zero probability.

29. Consider a dataset with the following points and labels:

Point (X, Y) Class

(2, 4) A
Point (X, Y) Class

(5, 8) B

(1, 3) A

(6, 9) B

(3, 6) A

A new data point (4, 7) needs classification using K = 3.

30. State the difference between Random Forest and Decision Tree method.

Feature Decision Tree Random Forest

Definition A single tree-based model An ensemble of multiple decision trees

Overfitting Prone to overfitting Reduced overfitting due to averaging

Accuracy Lower on unseen data Higher due to ensemble voting

Interpretability Easy to interpret Harder to interpret (many trees)

Performance Fast but unstable Slower but more stable and accurate

31. K-Means Clustering – One Iteration


Dataset:

X Y

1 2

2 3

3 4

8 9

9 10

10 11

Initial centroids:

• Centroid 1 (C1) = (2, 3)

• Centroid 2 (C2) = (9, 10)

32. A bank wants to classify whether a customer should get a loan based on income level and
credit score.

Dataset:
Income Credit Score Loan Approved?

High Good Yes

Low Good No

High Bad No

Medium Good Yes

Medium Bad No

33. Explain Linear and Nonlinear Regression Method.

• Linear Regression assumes a linear relationship: y=wx+by = wx + b

• Nonlinear Regression models more complex curves, e.g., exponential, polynomial, etc.
Example:

• Linear: House price vs square footage

• Nonlinear: Population growth over time

34. Difference between Clustering and Classification in ML

Feature Classification Clustering

Labels Labeled data (supervised) Unlabeled data (unsupervised)

Objective Predict categories Discover structure

Example Email spam detection Customer segmentation

35. Spam Classification Dataset

"Free" "Win" Spam?

Yes No Yes

No Yes No

Yes Yes Yes

No No No

Yes No Yes
36. Key features of Naïve Bayes Classifier

• Assumes feature independence

• Based on Bayes theorem

• Efficient and simple

• Works well on high-dimensional data

• Common in text classification

37. Fraud Detection using Naïve Bayes

Amount Suspicious Fraud (Yes) Fraud (No)

High Yes 8 2

High No 5 4

Medium Yes 6 5

Medium No 3 7

Low Yes 3 8

Low No 2 9

Total Yes = 27, No = 35

1. Prior Probabilities:
P(Yes)=27/62P(No)=35/62P(Yes) = 27 / 62 \quad P(No) = 35 / 62

2. Likelihoods:

• P(Medium | Yes) = 6/27, P(Yes | Suspicious = Yes) = (6+3)/27 = 9/27

• P(Medium | No) = 5/35, P(No | Suspicious = Yes) = (5+8)/35 = 13/35

3. Posterior (proportional):

• Yes ∝ P(Yes) * P(Medium|Yes) * P(Yes|Suspicious)

• No ∝ P(No) * P(Medium|No) * P(No|Suspicious)

Compare both and classify accordingly.


38. Define in context of DBSCAN

1. Core points: Have minimum number of neighbors (minPts) within radius (eps)

2. Border points: Within eps of a core point but not a core themselves

3. Noise points: Not within eps of any core point

Density reachability helps form clusters by connecting points that are density-reachable via a chain
of core points.

39. Apriori Algorithm & Association Rule Mining

• Apriori Algorithm: Identifies frequent itemsets in transactional data to form rules.

• Support: Proportion of transactions that contain an itemset.

• Confidence: Likelihood that item Y is bought when item X is bought.

• Lift: How much more likely Y is bought with X compared to random chance.

Pruning: Eliminate candidate sets that do not meet minimum support.

Apriori Property: A superset of an infrequent itemset cannot be frequent.


This reduces computational complexity by limiting the number of itemsets considered.

Optimization: Use hash trees, vertical data format, or FP-Growth to reduce candidate generation.

You might also like