**Solution to the Machine Learning Question Paper**
**btech-cai-cs-it-aid-cai-6-sem-machine-learning-6e7102-jul-2023.pdf**
**PART-A (2 marks each)**
**Q. 1. Express the Markov property mathematically.**
**Answer:** The Markov property states that the future state depends only on the current
state, not on the sequence of events that preceded it. Mathematically, it can be expressed
as:
P(S<sub>t+1</sub> | S<sub>t</sub>) = P(S<sub>t+1</sub> | S<sub>t</sub>,
S<sub>t-1</sub>, S<sub>t-2</sub>, ..., S<sub>0</sub>)
Where:
* S<sub>t+1</sub> is the state at time t+1
* S<sub>t</sub> is the state at time t
* S<sub>0</sub>, S<sub>1</sub>, ..., S<sub>t-1</sub> are the preceding states. [cite:
1120]
**Q. 2. Give a clear difference between episodic and continuous tasks of the Markov
process.**
**Answer:**
* **Episodic Tasks:** Have a clear start and end point, dividing the process into episodes.
Examples include games like chess.
* **Continuous Tasks:** Do not have a clear end point and continue indefinitely. Examples
include controlling a robot. [cite: 1120]
**Q. 3. Why is dimensionality reduction required for a dataset?**
**Answer:** Dimensionality reduction is required to reduce the number of features in a
dataset, which helps to simplify the model, reduce computational cost, and prevent
overfitting. [cite: 1120]
**Q. 4. Which cost function is used in logistic regression and why?**
**Answer:** The cost function used in logistic regression is the logistic loss (or
cross-entropy loss). It's used because it's convex, which means it has a single minimum,
making it suitable for gradient descent optimization. [cite: 1120]
**Q. 5. Write names of different types of clustering methods.**
**Answer:** Different types of clustering methods include:
* K-means clustering
* Hierarchical clustering
* DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
* Gaussian Mixture Models (GMM) [cite: 1120]
**Q. 6. What is the use of attribute selection measure in a decision tree classifier?**
**Answer:** Attribute selection measures are used to select the best attribute to split the
data at each node of the decision tree. They help in determining which attribute provides
the most information gain or the best separation of classes. [cite: 1120]
**Q. 7. Define Singular Value Decomposition (SVD).**
**Answer:** Singular Value Decomposition (SVD) is a matrix factorization technique that
decomposes a matrix A into three matrices: A = UΣV<sup>T</sup>, where U and V are
orthogonal matrices and Σ is a diagonal matrix containing singular values. [cite: 1120]
**Q. 8. What is Deep Learning?**
**Answer:** Deep learning is a subfield of machine learning that uses artificial neural
networks with multiple layers (deep neural networks) to learn from data. [cite: 1120]
**Q. 9. What is a support vector in SVM?**
**Answer:** Support vectors are the data points closest to the decision boundary
(hyperplane) in a Support Vector Machine (SVM). They are critical in defining the margin
and influencing the position and orientation of the hyperplane. [cite: 1120]
**Q. 10. Give the name of u-filter feature selection methods.**
**Answer:** I cannot give you "u-filter" feature selection methods as it seems to be a
typographical error. However, common univariate filter methods include:
* Chi-square test
* ANOVA F-value
* Mutual Information [cite: 1120]
**PART-B (8 marks each)**
**Q. 1. Explain the k-nearest neighbor algorithm with an example.**
**Answer:**
The k-nearest neighbors (k-NN) algorithm is a simple, supervised machine learning
algorithm used for classification and regression.
* **How it works:**
1. Given a data point to classify, the algorithm calculates the distance between this
point and all other points in the training set.
2. It then selects the k-nearest data points (neighbors) to the point being classified.
3. For classification, the algorithm assigns the data point to the class that is most
frequent among its k neighbors.
4. For regression, it predicts the value based on the average (or median) of the values
of its k neighbors.
* **Example:**
* Suppose we want to classify a new flower based on its sepal length and petal length.
We have a training set of flowers labeled as either "Iris setosa" or "Iris versicolor."
* If we set k = 3 and the 3 nearest neighbors to the new flower are 2 "Iris setosa" and 1
"Iris versicolor," the algorithm would classify the new flower as "Iris setosa." [cite: 1120]
**Q. 2. Suppose you are given the following set of training examples. Each attribute can
take on one of three nominal values: a, b, or c.**
(Table provided in the question)
**(a) How would a Naive Bayes classifier classify the example A1 = a, A2 = c, A3 = b?
Show all steps.**
**(b) How would a Naive Bayes classifier classify the example A1 = c, A2 = c, A3 = a?
Show all steps.**
**Answer:**
**(a) Classification of A1 = a, A2 = c, A3 = b**
1. **Calculate Prior Probabilities:**
* P(C1) = 2/6 = 1/3
* P(C2) = 4/6 = 2/3
2. **Calculate Likelihood Probabilities:**
* P(A1 = a | C1) = 1/2
* P(A1 = a | C2) = 1/4
* P(A2 = c | C1) = 1/2
* P(A2 = c | C2) = 2/4 = 1/2
* P(A3 = b | C1) = 0/2 = 0
* P(A3 = b | C2) = 1/4
3. **Calculate Posterior Probabilities:**
* P(C1 | A1 = a, A2 = c, A3 = b) = P(C1) \* P(A1 = a | C1) \* P(A2 = c | C1) \* P(A3 = b | C1) =
(1/3) \* (1/2) \* (1/2) \* 0 = 0
* P(C2 | A1 = a, A2 = c, A3 = b) = P(C2) \* P(A1 = a | C2) \* P(A2 = c | C2) \* P(A3 = b | C2) =
(2/3) \* (1/4) \* (1/2) \* (1/4) = 1/48
4. **Classify:**
* Since P(C2 | A1 = a, A2 = c, A3 = b) > P(C1 | A1 = a, A2 = c, A3 = b), the Naive Bayes
classifier would classify the example as **C2**.
**(b) Classification of A1 = c, A2 = c, A3 = a**
1. **Prior Probabilities:** (Same as in part a)
* P(C1) = 1/3
* P(C2) = 2/3
2. **Likelihood Probabilities:**
* P(A1 = c | C1) = 1/2
* P(A1 = c | C2) = 2/4 = 1/2
* P(A2 = c | C1) = 1/2
* P(A2 = c | C2) = 1/2
* P(A3 = a | C1) = 1/2
* P(A3 = a | C2) = 2/4 = 1/2
3. **Posterior Probabilities:**
* P(C1 | A1 = c, A2 = c, A3 = a) = P(C1) \* P(A1 = c | C1) \* P(A2 = c | C1) \* P(A3 = a | C1) =
(1/3) \* (1/2) \* (1/2) \* (1/2) = 1/24
* P(C2 | A1 = c, A2 = c, A3 = a) = P(C2) \* P(A1 = c | C2) \* P(A2 = c | C2) \* P(A3 = a | C2) =
(2/3) \* (1/2) \* (1/2) \* (1/2) = 2/24 = 1/12
4. **Classify:**
* Since P(C2 | A1 = c, A2 = c, A3 = a) > P(C1 | A1 = c, A2 = c, A3 = a), the Naive Bayes
classifier would classify the example as **C2**. [cite: 1120]
**Q. 3. Explain the FP-Growth algorithm for frequent pattern generation. Give a suitable
example and all computational steps with diagrams.**
**Answer:**
The FP-Growth algorithm is an efficient method for mining frequent itemsets without
candidate generation. It uses a tree structure called an FP-tree to compress the dataset
and extract frequent patterns.
* **Steps:**
1. **Scan the database and find the support for each item.**
2. **Remove infrequent items:** Eliminate items with support less than the minimum
support threshold.
3. **Sort items by frequency:** Sort the remaining items in descending order of their
support.
4. **Construct the FP-tree:**
* Create the root of the FP-tree and label it as "null."
* For each transaction:
* Sort the frequent items in the transaction according to their frequency order.
* Insert the ordered frequent items into the FP-tree. If items share a prefix, they
share the corresponding part of the tree. Increment the count of each node as it is visited.
5. **Mine the FP-tree:**
* Start from the bottom of the FP-tree.
* For each frequent item:
* Find all paths that contain the item (conditional pattern base).
* Construct a conditional FP-tree for each item.
* Generate frequent itemsets from the conditional FP-trees.
* **Example:**
Let's consider a simplified transaction database with min\_sup = 2:
| TID | Items |
| :-----: | :-------: |
| T1 | a, b |
| T2 | b, c |
| T3 | a, b, c |
| T4 | a, d |
| T5 | b, d |
| T6 | a, b |
1. **Item Support:**
* a: 4, b: 5, c: 2, d: 2
2. **Frequent Items:**
* All items are frequent (support >= 2).
3. **Sorted Items:**
* b: 5, a: 4, c: 2, d: 2
4. **FP-tree Construction:** (A diagram would be very helpful here, but I can describe
it)
* Root (null)
* T1: null -> b:1 -> a:1
* T2: null -> b:1 -> c:1
* T3: null -> b:1 -> a:1 -> c:1
* T4: null -> a:1 -> d:1
* T5: null -> b:1 -> d:1
* T6: null -> b:1 -> a:1
* Combine nodes where possible, incrementing counts.
5. **Mine FP-tree:**
* For d: {a:1, b:1} -> {a, d}:2, {b, d}:2
* For c: {b:2, a:1} -> {b, c}:2, {a, c}:1, {a,b,c}:1
* For a: {b:3} -> {a, b}:3
* Frequent itemsets: {a,b}, {b,c}, {a,c}, {a,b,c}, {a,d}, {b,d} [cite: 1120]
**Q. 4. A neural network takes two binary values inputs, x1, x2 ∈ {0, 1} and the activation
function is the binary threshold function... Design a neural network to compute the AND
Boolean function. Consider the truth table for of AND Boolean functions. weights are (2, 2)
and Biase is -3**
**Answer:**
To design a neural network to compute the AND Boolean function with given weights and
bias:
* **Truth Table for AND:**
| x1 | x2 | x1 AND x2 |
| :----: | :----: | :-----------: |
| 0 | 0 | 0 |
| 0 | 1 | 0 |
| 1 | 0 | 0 |
| 1 | 1 | 1 |
* **Neural Network Structure:**
* Two input nodes: x1, x2
* One neuron in the output layer
* Weights: w1 = 2, w2 = 2
* Bias: b = -3
* Activation function: `h(z) = 1 if z > 0, 0 otherwise`
* **Computation:**
* The neuron calculates the weighted sum of the inputs and adds the bias: z = w1\*x1 +
w2\*x2 + b
* Then, it applies the activation function to the result: output = h(z)
* **Verification:**
* x1 = 0, x2 = 0: z = 2\*0 + 2\*0 - 3 = -3; h(-3) = 0
* x1 = 0, x2 = 1: z = 2\*0 + 2\*1 - 3 = -1; h(-1) = 0
* x1 = 1, x2 = 0: z = 2\*1 + 2\*0 - 3 = -1; h(-1) = 0
* x1 = 1, x2 = 1: z = 2\*1 + 2\*1 - 3 = 1; h(1) = 1
This neural network correctly computes the AND function. [cite: 1120]
**Q. 5. What is reinforcement learning? Give an example of reinforcement learning with all
steps.**
**Answer:**
Reinforcement learning is a type of machine learning where an agent learns to interact
with an environment by taking actions to maximize cumulative reward. The agent is not
told explicitly how to achieve the task but must discover it by trial and error.
* **Example: Training a robot to navigate a maze**
1. **Environment:** The maze, with a starting point, walls, and a goal.
2. **Agent:** The robot.
3. **Actions:** The robot can move: Up, Down, Left, Right.
4. **State:** The robot's current position in the maze.
5. **Reward:**
* \+1 for reaching the goal.
* \-0.1 for each step (to encourage faster solutions).
* \-1 for hitting a wall.
6. **Learning Process:**
* The robot starts at the beginning.
* It takes a random action.
* The environment gives it the next state and a reward.
* The agent updates its policy (strategy) based on the reward. For example, if an
action leads to a positive reward, the agent is more likely to take that action again.
* This process repeats many times (episodes) until the agent learns an optimal policy
to reach the goal.
7. **Policy:** The agent learns a policy that maps states to the best actions. [cite: 1120]
**Q. 6. Write short notes on the following:**
**(a) Principal component analysis**
**(b) Independent component analysis**