0% found this document useful (0 votes)

7 views12 pages

Chapter 1

Chapter 1 discusses various concepts in machine learning, including the purpose of the discriminant function, the convergence of the perceptron learning algorithm, and the differences between supervised and reinforcement learning. It highlights the importance of the hypothesis space, the distinction between linear and non-linear algorithms, and the role of domain experts in validating model outputs. Additionally, it covers the significance of conditional probability, the training process for linear classification models, and introduces the Cocktail Party Algorithm as a challenge in unsupervised learning.

Uploaded by

samuel tekyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views12 pages

Chapter 1

Uploaded by

samuel tekyi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapter 1

Questions and Answers: Chapter 1

Alaa Tharwat

Q1: What is the purpose of the discriminant function?

Answer: A discriminant function g(x) assigns a scalar value to each input vector x, which is used to assign it to a
specific class. Its purpose is to separate the (multidimensional) feature space in such a way that:

• Class Separation: The decision is unambiguous due to the sign (for binary classification) or the maximum among
several discriminant functions (for multi-class classification). For example, if g(x) > 0, the input x belongs to class 1;
if g(x) < 0, it belongs to class 2.
• Ranking: Value intervals allow an estimation of the certainty (“margin”) of the decision. For instance, if g(x1 ) = 2
and g(x2 ) = 0.5, we can infer that the classification for x1 is more certain than for x2 .
• Formulation of the Decision Area: The equation g(x) = 0 defines a function line or a hyper-surface that separates
the classes. For instance, in a 2D feature space, the line defined by g(x1 , x2 ) = 0 separates the feature space into two
regions, each corresponding to one of the classes.

Example: Consider a simple case of classifying flowers based on two features: petal length and petal width. The
discriminant function could be defined as:

g(x) = w1 · petal length + w2 · petal width + w0

where w1 and w2 are weights determined during training, and w0 is a bias term. The line g(x) = 0 would separate the
flowers into two classes: say, ”setosa” and ”versicolor”.

Q2: How does the perceptron learning algorithm guarantee convergence, and what does this reveal about
the data?

Answer: The perceptron learning algorithm converges if the data is linearly separable, meaning there exists a straight
line (in two dimensions) or a hyperplane (in higher dimensions) that can perfectly divide the classes without any misclas-
sification.
When the algorithm encounters a misclassified example, it updates the weights based on the error, moving the decision
boundary closer to the correct position. This process is repeated for each misclassified instance until all instances are
correctly classified, or it reaches a specified number of iterations.
If the data is not linearly separable, such as when the classes are intertwined (e.g., points from Class A and Class B
are mixed together), the perceptron will not converge. It will continue to adjust the weights indefinitely without finding a
stable solution.

Alaa Tharwat
Center for Applied Data Science Gütersloh (CfADS), Hochschule Bielefeld-University of Applied Sciences, 33619 Bielefeld, Germany, e-mail:
[email protected],[email protected]

1
2 Alaa Tharwat

In summary, the convergence of the perceptron learning algorithm is a strong indicator that the data can be cleanly
separated by a linear boundary, revealing important information about the structure of the data.

Q3: What is the difference between supervised learning and reinforcement learning?

Answer: Supervised learning uses labeled data. Here the model learns a direct input-output mapping with clear feed-
back. In reinforcement learning an agent makes sequential decisions with delayed, scalar rewards instead of labeled
outcomes. This makes reinforcement learning more exploratory and more suitable for dynamic environments like games
or robotics, where the agent has to adapt to get the maximal (expected) reward or utility (long-term rewards).

Q4: Why is the hypothesis space needed in the machine learning process, and what role does the learning
algorithm play within it?

Answer: The hypothesis space is the set of all possible functions that a model can consider when approximating the
unknown target function. It defines the range of potential solutions the learning algorithm can explore.
The learning algorithm plays a crucial role within this hypothesis space by navigating through it to identify and select
the hypothesis that best fits the training data. This process involves evaluating how well each hypothesis predicts the
outcomes based on the input features, typically by minimizing a loss function that quantifies the error between predicted
and actual values.

Q5: What is a linear algorithm, and how does it differ from a non-linear algorithm?

Answer: A linear algorithm models the relationship between input features (X) and output (Y ) using a linear equation.
For example, in a simple linear regression, the relationship can be expressed as:

y = wx + b

In higher dimensions, the equation extends to a linear combination of inputs:

y = w1 x1 + w2 x2 + . . . + wn xn + b.

In contrast, a non-linear algorithm models relationships where the output cannot be expressed as a simple linear combi-
nation of inputs. Non-linear algorithms can include operations such as multiplication, squaring, division, or transformation
through non-linear functions like log, exp, or sin.
For example, a non-linear relationship might be expressed as:

y = w0 x0 · w1 x1 + b.

This means the output is influenced by the product of input features, capturing more complex patterns in the data.

Q6: In machine learning, what informs the choice between using a linear or non-linear model to predict the
output for a given dataset?

Answer: Several factors guide the choice between using a linear or non-linear model:

• Manual Inspection: If the output (y) consistently increases or decreases as an input (x) changes, a linear model may
be appropriate. This suggests that the relationship is likely linear.
• Visualization: Plotting the inputs against the output can reveal patterns. A near-straight line in the plot suggests that
a linear model might be suitable, while scattered or curved patterns indicate the need for a non-linear model.
1 Questions and Answers: Chapter 1 3

• Start with Linear: It is often advisable to start with a linear model and evaluate its performance using error metrics
like Mean Squared Error (MSE). If the MSE is too high, indicating poor fit, it may be time to consider a non-linear
model.
• High-Dimensional Data: Complex, high-dimensional datasets often exhibit non-linear relationships. In such cases,
non-linear models may be more appropriate to capture the intricate patterns present in the data.

Q7: Machine learning uses statistical techniques to learn, but statistical results are probabilistic, operating
in a world of uncertainty and inference rather than absolutes. Does this mean domain experts are always
needed to vet the outputs of machine learning models?

Answer: Yes, machine learning models rely on statistical techniques to identify patterns in data and to generalize using
methods such as optimization and linear algebra. While machine learning can uncover valuable patterns, these patterns
may not always be meaningful or contextually relevant.
Domain experts are essential for several reasons:

• Interpreting Results: Experts provide insights into whether the identified patterns align with domain knowledge and
make sense in context.
• Validating Model Outputs: They assess the accuracy and reliability of the model’s predictions, ensuring that outputs
are valid.
• Ensuring Practical Applicability: In critical fields, such as healthcare or finance, decisions based on model outputs
can have significant consequences. Domain experts help ensure that models are applied correctly and ethically.

Q8: Why is it important that the target function in Machine Learning is unknown?

Answer: If the target function were known, learning would not be necessary. Machine Learning is required precisely
because the true relationship between inputs and outputs must be inferred from data without explicit instructions.

Q9: What happens if there is no underlying pattern in the data provided to a Machine Learning model?

Answer: If no underlying pattern exists, the Machine Learning model cannot learn meaningful relationships, and its
predictions will be no better than random guessing.

Q10: Provide an example of a non-linear relationship in a Machine Learning problem and briefly explain
it.

Answer: An example of a non-linear relationship is image recognition, where the relationship between pixel intensities
and the corresponding object category (such as distinguishing between a cat and a dog) is highly complex and cannot be
represented by a simple straight line or linear function. Neural networks are typically used to capture such non-linear
patterns through multiple layers and non-linear activation functions.

Q11: How work a training of a linear classification model?

Answer: The training of a linear classification model involves several key steps:

1. Data Collection: First, collect the data points, where X represents the input dimensions and C represents the output
classes.
2. Weight Initialization: Initialize the weights W to small random values. This is important for breaking symmetry
during training.
3. Training Process:
4 Alaa Tharwat

• Prediction: For each input data point, predict the output using the linear function y = W T · X + b (where b is the
bias, often initialized to 1).
• Error Calculation: Identify which predicted values are incorrect by comparing them to the actual class labels.
• Weight Update: For each misclassified example, update the weights using the formula:

W ← W + α · (ytrue − ypred ) · X

where α is the learning rate.

• Repeat the prediction and updating steps until either all predictions are correct or a specified number of iterations
is reached.

Q12: Why discriminant function is always one dimension less than the whole space?

Answer: The discriminant function represents the decision boundary in classification problems. It is always one di-
mension less than the feature space because it defines the boundary that separates different classes.
In an n-dimensional space, the decision boundary is a hyperplane of dimension n − 1. This means:

• In 2D space (where n = 2), the decision boundary is a line (1D).

• In 3D space (where n = 3), the decision boundary is a plane (2D).

This relationship holds because the discriminant function essentially captures the conditions under which the classes
change, forming a boundary that is defined by all possible points of equal likelihood for each class. Therefore, to separate
the classes effectively, the boundary must exist in a space that is one dimension lower than the space of the input features.

Q13: Why is conditional probability important in machine learning?

Answer: We use conditional probability in machine learning because, in most scenarios, our target function is not
a well-defined mathematical function. There is often noise in the data, meaning that the same input can yield different
outputs.
By using conditional probability, we can model the likelihood of an output given a specific input, rather than attempting
to predict a single, precise output. This approach allows us to account for the inherent uncertainty and variability in the
data.
For example, in classification tasks, we may want to determine the probability of different classes given the input fea-
tures. This probabilistic framework enables more robust decision-making and helps in managing uncertainty effectively.

Q14: How perceptron learning algorithm changes its weight vectors when it finds a misclassified point?
Why model’s hypothesis is improved by this update?

Answer: In the Perceptron Learning Algorithm (PLA), a misclassified training point (xn , yn ) indicates that the current
hypothesis h(x) = sign(wT x) has incorrectly predicted the label of xn .
The PLA updates the weight vector w using the following rule:

w(t + 1) = w(t) + yn xn

For the misclassified point, the term yn xn adjusts the weight vector in the direction that improves the prediction:

• If yn = +1 and the model predicted −1, the algorithm increases the weight vector by adding xn , effectively moving
the decision boundary to better classify this point.
• Conversely, if yn = −1 and the model predicted +1, the algorithm decreases the weight vector by subtracting xn ,
shifting the boundary in the opposite direction.
1 Questions and Answers: Chapter 1 5

This update modifies the discriminant function wT x = 0, thereby reducing the error for that specific point. As this
process is repeated over iterations, the hypothesis h ∈ H is refined, bringing it closer to the unknown target function f
over time.

Q15: If the target function f is fixed but unknown, and the goal is to approximate it using a hypothesis g
from a predefined set of hypotheses H (where g ∈ H), what criteria should we use to select H to ensure it
does not limit the model’s ability to generalize to unseen data? What should we consider before selecting
the model?

Answer: When training a model, the target function f is unspecified, meaning we attempt to approximate it using a
hypothesis g from a set of possible hypotheses, known as the hypothesis set H.
The selection of H significantly impacts generalization. Here are the key considerations:

• Model Complexity: If H is too simple, the model will struggle to capture the underlying patterns in the data, leading
to underfitting. This occurs when the model fails to learn enough from the training data and performs poorly on both
training and unseen data. Conversely, if H is too complex, the model may learn the training data patterns too closely,
including noise. This results in overfitting, where the model performs well on training data but poorly on new, unseen
data.
• Balance: It is crucial to find a balance in the complexity of H. A model that is too simple may overlook important
patterns, while a complex model may become too tailored to the training data.
• Domain Knowledge: Leverage domain knowledge to inform the selection of H. Understanding the nature of the data
and the relationships between features can guide the choice of an appropriate hypothesis set.

Q16: What are the main components of a learning problem? Give an example and identify each component
within it.

Answer: The main components of a learning problem are:

• Input (X): The features used to describe each instance.

• Output (y): The value we want to predict (e.g., price, label).
• Training Data (D): A dataset containing examples of input-output pairs (x1 , y1 ), . . . , (xn , yn ).
• Target Function ( f ): The unknown function that maps each input x to the correct output y.
• Learning Algorithm: The method used to analyze the training data D and choose a good approximation g for the
target function f .
• Hypothesis Set (H): A set of candidate functions from which the algorithm can choose.

Example: Predicting if an email is spam

• Input (X): Features could include the presence of certain words, the length of the email, etc.
• Output (y): 0 if the email is not spam, 1 if it is spam.
• Training Data (D): A collection of labeled emails, such as (email1, 0), (email2, 1), . . ..
• Target Function ( f ): The true rule that determines whether an email is spam (this is unknown).
• Learning Algorithm: For example, logistic regression, which uses the training data D to find a rule g.
• Hypothesis Set (H): The set of all logistic models that the algorithm could choose from.

Q17: Why is the concept of a discriminant function important in classification problems, and how does it
relate to the hypothesis function in a linear model like the Perceptron?

Answer: In classification problems, a discriminant function is crucial as it defines the boundary that separates different
classes within the input space. This boundary determines the regions where the model predicts distinct output labels.
6 Alaa Tharwat

In a linear model like the Perceptron, the discriminant function is represented by the equation:

wT x = 0

This equation describes a hyperplane that divides the input space into two halves: one region where the hypothesis
function h(x) = sign(wT x) = +1 (for example, to approve credit) and another where h(x) = −1 (for example, to deny
credit).
The weights w play a vital role in determining the orientation and position of this boundary.
The discriminant function directly influences the model’s classification behavior. An effective hypothesis function not
only classifies the training data correctly but also positions the boundary in such a way that new, unseen data are more
likely to be classified accurately.

Q18: What is the Cocktail Party Algorithm?

Answer: In the course, we discussed two unsupervised methods: clustering and the cocktail party problem. While I am
familiar with clustering, I initially had not heard of the cocktail party concept, prompting me to do some quick research.
The ”Cocktail Party Effect” refers to the brain’s ability to focus on a single speaker while filtering out other voices and
background noise. This phenomenon presents a challenge rather than a specific algorithm.
During my research, I found various interesting solutions, one of which was implemented by Siddharth Sharma and
shared on Medium. He utilized an unsupervised learning method known as Independent Component Analysis (ICA),
which is a computational technique for separating a multivariate signal into its additive subcomponents.
This approach effectively addresses the cocktail party problem by isolating individual voices from overlapping audio
signals.

Q19: Can you elaborate more on Error Functions?

Answer: In Chapter 1, we defined what an error function is and its role in learning. However, I was curious about the
different types of error functions and their specific applications.
To explore this further, I searched through the table of contents of our book, “Learning from Data: A Short Course,”
but did not find additional information. Therefore, I conducted some quick research to gather more details.
Here are some common error functions, also known as loss functions:
1. **Mean Square Error (MSE)**: The mean square error is the average of the squared differences between the target
function f and the current function h:
1
MSE = (yi − ȳ)2
n∑
MSE is useful when the model needs to be sensitive to outliers, as larger deviations from the target result in significantly
greater penalties.
2. **Mean Absolute Error (MAE)**: This function sums the absolute differences without squaring them:

1
MAE = |yi − ȳ|
n∑

Since it treats all errors equally, MAE is preferable when outliers do not need to be penalized severely.
Both of these functions are commonly used in regression tasks, which will be explained in greater detail later in the
book.

Q20: What’s the effect of a non-random sample on the feasibility of learning?

1 Questions and Answers: Chapter 1 7

Answer: The goal of learning is to find a hypothesis that can generalize across the entire population. Given that we
only have a sample, can we develop a reliable hypothesis?
We have demonstrated that learning is feasible and can generate a hypothesis with a tolerance ε using Hoeffding’s
Inequality. For example, consider flipping a coin N times and recording the results as follows:

X1 = 1, X2 = 0, X3 = 1, . . .

where 1 represents heads and 0 represents tails.

Let µ = E[X] be the true probability of heads (which is unknown) and ν be the observed proportion of heads:

1 N
ν= ∑ Xi
N i=1

According to Hoeffding’s Inequality:

2N
P(|ν − µ| > ε) ≤ 2e−2ε

This inequality states that the probability of the sample estimate being off by more than ε from the true value is at most
2
2e−2ε N .
While we have shown that learning is feasible and can produce a hypothesis with a certain tolerance, this holds true only
under the assumption that our sample is completely random, meaning it is free of bias. Although theoretically possible,
achieving a truly random sample is very difficult in practice.
Wiem Souai discussed in her article “Impacts of Sampling Bias on Model Performance” on Medium how various types
of sampling bias can hinder a model’s generalization. Here are some common types of sampling bias:

1. Selection Bias: Occurs when data is not collected equally from the entire population, such as gathering survey re-
sponses only from a city while ignoring suburbs.
2. Non-Response Bias: Arises when individuals refuse to answer a poll or survey.
3. Data Source Bias: Results from collecting data from a specific platform or medium.

Collecting a random sample is a complex process. The more biased a sample is, the harder it becomes to derive a
hypothesis that generalizes well across the entire population.

Q21: I would like to ask how we can analyze a large number of features when working in a company that
manufactures windows?

Answer: There are many aspects to consider, and I am interested in identifying the main features that are important for
our machine learning model to predict the production timeline for specific types of windows.
In thinking about this, I considered that we might compare all the data to see which features vary significantly and
which remain relatively constant. By doing so, we could identify and select the top five characteristics that have the most
impact on our predictions.
Using techniques like Principal Component Analysis (PCA) could help in reducing dimensionality and highlighting
the most significant features. This approach allows us to focus on the most relevant data while minimizing the influence
of less important variables.

Q22: Why does generalization remain uncertain even if a hypothesis perfectly fits the training data?

Answer: Generalization remains uncertain because multiple hypotheses may perform equally well on the training set
but can differ significantly when applied to unseen data. A perfect fit to the training data does not guarantee accurate
predictions on new examples, particularly when the true target function is unknown.
8 Alaa Tharwat

Q23: How does probability theory reconcile the contradiction between learning from finite data and the
unknown target function?

Answer: Deterministic reasoning suggests that we cannot learn anything about unseen data without knowing the
complete target function. However, probability theory provides a workaround by allowing us to estimate the likelihood
that our model will generalize based on the training data. This probabilistic approach enables learning even in the presence
of incomplete information.

Q24: Why do we distinguish between in-sample error and out-of-sample error in learning, and what is the
significance of their difference?

Answer: In-sample error (Ein ) measures how well a hypothesis fits the training data, whereas out-of-sample error (Eout )
reflects its performance on unseen data, which is ultimately what matters. The difference between these two errors, known
as the generalization gap, indicates the reliability of the learned model. A small gap suggests good generalization, while a
large gap is a warning sign of overfitting or poor learning.

Q25: What does Hoeffding’s Inequality say about the relationship between training error and true error?

Answer: Hoeffding’s Inequality provides a probabilistic bound on the difference between in-sample error (Ein ) and
out-of-sample error (Eout ). It indicates that with a sufficiently large sample size, Ein is likely to be close to Eout with high
probability. This gives us statistical confidence that learning from finite data can generalize effectively.

Q26: Why is the hypothesis set size (|H|) important in determining the feasibility of learning?

Answer: The size of the hypothesis set (|H|) is important because a larger number of hypotheses increases the likeli-
hood that at least one will fit the training data well purely by chance, rather than by accurately capturing the true underlying
pattern, leading to overfitting. In this scenario, Hoeffding’s bound must be adjusted using the union bound.
As the number of hypotheses increases, our confidence in the model’s generalization becomes looser, unless we com-
pensate by increasing the number of training examples N. Therefore, keeping the hypothesis set small—or employing
techniques like regularization—is essential for making learning feasible.

Q27: Why do we need a discriminant function and what are the requirements to use it?

Answer: - The discriminant function divides the input space into two sets, one for each class.
- It is typically used in supervised learning.
- It requires labeled data, and is generally applicable to binary classification (two classes).
- The function is defined as wT x = 0, which indicates that the transposed weights of the hypothesis multiplied by the
input vector x equals zero.

Q28: What is the main difference between supervised learning and unsupervised learning in machine learn-
ing?

Answer: In supervised learning, the model is trained on a dataset that contains correct outputs, allowing it to learn
the input-output relationship. This enables the model to make accurate predictions for new data based on the patterns it
learned from previously recorded outputs.
1 Questions and Answers: Chapter 1 9

In contrast, unsupervised learning uses unlabeled data, allowing the model to discover structures and patterns within the
data. The goal is for the model to understand the underlying structure in a chaotic environment and to generate meaningful
insights or solutions.

Q29: How does the reward system in Reinforcement Learning affect the training process of the model?
What is the role of the reward system in the learning process of the model?

Answer: The reward system is a fundamental feature that distinguishes Reinforcement Learning (RL) from supervised
and unsupervised learning. It aims to reinforce the interaction between the model and the environment, optimizing the
selection of correct actions.
For this reason, it is crucial that the reward structure is designed effectively to ensure that the model can be trained
quickly and efficiently. If the reward system is set incorrectly, the learning process may slow down, and the model may
repeat incorrect behaviors, leading it to focus on ineffective strategies.

Q30: Types of Learning in Machine Learning and their major differences?

Answer: Supervised Learning: when a model gets trained on a ”Labelled Dataset”. Labelled datasets have both input
and output parameters. In Supervised Learning algorithms learn to map points between inputs and correct outputs. It has
both training and validation datasets labelled.

• Data: Labelled (input-output pairs)

• Goal: Learn a mapping from inputs to outputs
• Examples: Spam detection, image classification, stock price prediction
• Common Algorithms: Linear Regression, Decision Trees, Support Vector Machines

Unsupervised Learning: It draws inferences from unlabelled datasets, facilitating exploratory data analysis and en-
abling pattern recognition and predictive modelling.It uses clustering algorithms to categorise data points according to
value similarity.

• Data: Unlabelled
• Goal: Discover hidden patterns or intrinsic structures in data
• Examples: Customer segmentation, anomaly detection, topic modeling
• Common Algorithms: K-Means Clustering, Hierarchical Clustering, Principal Component Analysis
10 Alaa Tharwat

Reinforcement Learning: The model (agent) learns by interacting with an environment and receiving feedback in the
form of rewards or punishments. Unlike supervised learning, it does not receive explicit correct answers but instead learns
through trial and error to maximize cumulative reward.

• Data: Feedback from actions (rewards or penalties)

• Goal: Learn optimal actions through trial and error to maximize cumulative reward
• Examples: Game playing (e.g., AlphaGo), robotic control, recommendation systems
• Common Algorithms: Q-Learning, Deep Q-Networks, Policy Gradient Methods

Type Data Requirement Output Type Common Use Cases

Supervised Labeled Prediction Classification, regression tasks
Unsupervised Unlabeled Pattern discovery Clustering, anomaly detection
Reinforcement Environment feedback Action policies Robotics, games, sequential control

Table 1.1: Comparison of Machine Learning Types

Q31: In the Perceptron Learning Algorithm, why do we multiply our error by x?

I understand the logic behind the update rule:

wi = wi−1 + learning rate · (ytrue − ypred )

If our model’s prediction is higher than y, the result of (ytrue − ypred ) will be negative, leading to a decrease in wi
and less activation in the next iteration. If the prediction is lower, the weight will be increased.
But why then do we still need the x in

wi = wi−1 + learning rate · (ytrue − ypred ) · x?

Or as noted in the slides, under the assumption that this is only applied to misclassified samples:

w(t + 1) ← w(t) + xn yn

Answer:
1 Questions and Answers: Chapter 1 11

When we multiply the error by x, we’re essentially saying ”adjust each weight in proportion to how strongly its corre-
sponding input feature was activated.” This makes intuitive sense because:

• If a feature xi is large (e.g., xi = 5), it had a strong influence on the incorrect prediction, so its weight should be
adjusted more significantly.
• If a feature xi is small or zero (e.g., xi = 0), it had little or no influence on the prediction, so its weight shouldn’t
change much or at all.

Example:
Imagine a simplified case with just two features x1 and x2 , and we misclassify a point:

• If the point has features [x1 = 5, x2 = 0.1]

• The error (ytrue − ypred ) = 1
• Then our weight updates would be:

– w1 would be updated by 5 × 1 = 5 (a large adjustment)

– w2 would be updated by only 0.1 × 1 = 0.1 (a small adjustment)

This makes sense because feature x1 had a much stronger presence in this example, so its corresponding weight deserves
a larger correction.
What Would Happen Without Multiplying by x?
If we didn’t multiply by x and just used

w(t + 1) ← w(t) + learning rate · (ytrue − ypred )

we would:

• Adjust all weights equally regardless of the corresponding feature values.

• Ignore the geometric relationship between the input features and the decision boundary.
• Likely take much longer to converge or possibly never converge to a solution.

The multiplication by x ensures that the learning algorithm is both mathematically sound and intuitively reasonable in
how it adjusts the decision boundary.

ML Interview Questions PDF
83% (6)
ML Interview Questions PDF
20 pages
Specialty Chemical Production Analysis
No ratings yet
Specialty Chemical Production Analysis
8 pages
Machine Learning Practical File
No ratings yet
Machine Learning Practical File
41 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
Top 100 Machine Learning Questions With Answers For Interview PDF
100% (3)
Top 100 Machine Learning Questions With Answers For Interview PDF
48 pages
ML Question Bank Solution
No ratings yet
ML Question Bank Solution
95 pages
30 Days of Interview Preparation
100% (1)
30 Days of Interview Preparation
415 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
300 pages
Unit Ii (57 92)
No ratings yet
Unit Ii (57 92)
36 pages
ML 2 Marks
No ratings yet
ML 2 Marks
7 pages
Machine Learning Types & Techniques
No ratings yet
Machine Learning Types & Techniques
17 pages
Machine Learning (BCS-055) QUS & ANS
No ratings yet
Machine Learning (BCS-055) QUS & ANS
29 pages
Data Science Interview Questions (#Day9)
No ratings yet
Data Science Interview Questions (#Day9)
9 pages
Machine Learning Essentials Guide
No ratings yet
Machine Learning Essentials Guide
21 pages
Supervised vs. Unsupervised Learning
No ratings yet
Supervised vs. Unsupervised Learning
3 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
Basic Machine Learning Interview Q&A
100% (1)
Basic Machine Learning Interview Q&A
4 pages
ML Viva Questions
No ratings yet
ML Viva Questions
25 pages
Machine Learning IQs
100% (1)
Machine Learning IQs
13 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Machine Learning Ass
No ratings yet
Machine Learning Ass
15 pages
LECTURE SET 07 - Machine Learning For Artificial Intelligence
No ratings yet
LECTURE SET 07 - Machine Learning For Artificial Intelligence
48 pages
Machine Learning SELF
No ratings yet
Machine Learning SELF
29 pages
2 Marks MACHINE LEARNING
No ratings yet
2 Marks MACHINE LEARNING
8 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
Q1-What's The Trade-Off Between Bias and Variance?
100% (1)
Q1-What's The Trade-Off Between Bias and Variance?
5 pages
Unit3-2 Marks
No ratings yet
Unit3-2 Marks
10 pages
Module 2
No ratings yet
Module 2
44 pages
Machine Learning Interview Questions & Answers - MIQ
No ratings yet
Machine Learning Interview Questions & Answers - MIQ
17 pages
Machine Learning Interview Question
No ratings yet
Machine Learning Interview Question
9 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Ml-Unit 2-QB
No ratings yet
Ml-Unit 2-QB
6 pages
Data Science Interview Questions
100% (1)
Data Science Interview Questions
68 pages
Practice Questions
No ratings yet
Practice Questions
8 pages
Data Science Interview Questions #Week1
No ratings yet
Data Science Interview Questions #Week1
111 pages
Data Science Interview
100% (4)
Data Science Interview
12 pages
Machine Learning Units 1 To 5 Bolded Questions
No ratings yet
Machine Learning Units 1 To 5 Bolded Questions
19 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
QUIZ Data
No ratings yet
QUIZ Data
18 pages
Machine Learning Questions
100% (1)
Machine Learning Questions
19 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
15 pages
ML Sample PDF
No ratings yet
ML Sample PDF
5 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Machine Learning BCA QA Detailed
No ratings yet
Machine Learning BCA QA Detailed
3 pages
Full Machine Learning Definition
No ratings yet
Full Machine Learning Definition
79 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Machine Learning Viva Questions
No ratings yet
Machine Learning Viva Questions
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
Lec 2
No ratings yet
Lec 2
22 pages
LECTURE SET 07 - Machine Learning For Artificial Intelligence
No ratings yet
LECTURE SET 07 - Machine Learning For Artificial Intelligence
75 pages
41 Essential Machine Learning Interview Questions: 18 Mins Read
No ratings yet
41 Essential Machine Learning Interview Questions: 18 Mins Read
21 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
4 pages
ML Previous Year
No ratings yet
ML Previous Year
28 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
38 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
BDA 03 Architectures
No ratings yet
BDA 03 Architectures
91 pages
ESDIT 2024 - Reimagining Digital Well-Being (Report)
No ratings yet
ESDIT 2024 - Reimagining Digital Well-Being (Report)
34 pages
LFD Lecture1New
No ratings yet
LFD Lecture1New
93 pages
Kettner 2025 - Eight Notes On Power and Algorithms
No ratings yet
Kettner 2025 - Eight Notes On Power and Algorithms
24 pages
Exam Figures
No ratings yet
Exam Figures
20 pages
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
100% (1)
FELICIANO MALIWAT, Petitioner, vs. HON. COURT OF APPEALS, Former Special First Division, and The REPUBLIC OF THE PHILIPPINES, Respondents
7 pages
Write The Room
No ratings yet
Write The Room
11 pages
Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions
No ratings yet
Fog Computing: Survey of Trends, Architectures, Requirements, and Research Directions
31 pages
Kirin PDF
No ratings yet
Kirin PDF
28 pages
ETREP
No ratings yet
ETREP
20 pages
Romantic Escapade - South & North Goa
No ratings yet
Romantic Escapade - South & North Goa
15 pages
BRS Embryology 6th Edition by Ronald W. Dudek ISBN 1469873702 9781469873701 - Get Instant Access To The Full Ebook Content
100% (18)
BRS Embryology 6th Edition by Ronald W. Dudek ISBN 1469873702 9781469873701 - Get Instant Access To The Full Ebook Content
68 pages
Bharat Parekh
100% (1)
Bharat Parekh
3 pages
Employee Rights in Bereavement Cases
No ratings yet
Employee Rights in Bereavement Cases
1 page
Rajneeti: Council of Ministers S. No. Name Department Office
No ratings yet
Rajneeti: Council of Ministers S. No. Name Department Office
20 pages
Metal Casting 3
No ratings yet
Metal Casting 3
23 pages
Roscrea Suffolk Sale Catalogue
100% (1)
Roscrea Suffolk Sale Catalogue
78 pages
HPCNA Ethics & Conducts-210623-021012
No ratings yet
HPCNA Ethics & Conducts-210623-021012
63 pages
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
No ratings yet
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
1 page
A Review On Springback Effect in Sheet Metal Forming Process
No ratings yet
A Review On Springback Effect in Sheet Metal Forming Process
7 pages
Principles of Marketing: Developing New Products and Managing The Product Life Cycle
No ratings yet
Principles of Marketing: Developing New Products and Managing The Product Life Cycle
35 pages
Testing MOSFETs with Multimeter
100% (1)
Testing MOSFETs with Multimeter
3 pages
Scomi Drilling Fluid
No ratings yet
Scomi Drilling Fluid
23 pages
Computer Engineering Technician - Sample Resume
No ratings yet
Computer Engineering Technician - Sample Resume
2 pages
VCDS Diagnostic Report
No ratings yet
VCDS Diagnostic Report
7 pages
Conceptual Framework
No ratings yet
Conceptual Framework
12 pages
2024.05.08 Poki Playtest Privacy Statement
No ratings yet
2024.05.08 Poki Playtest Privacy Statement
3 pages
School Space Allocation Guide
No ratings yet
School Space Allocation Guide
5 pages
Mos Word 2016 - Core Practice Exam 3 Training
No ratings yet
Mos Word 2016 - Core Practice Exam 3 Training
9 pages
Export Import and Countertrade
No ratings yet
Export Import and Countertrade
32 pages
TEDwp Fy Youth
No ratings yet
TEDwp Fy Youth
4 pages
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
No ratings yet
Improving Population Health Using Electronic Health Records: Methods For Data Management and Epidemiological Analysis 1st Edition Goldstein
60 pages
Peluang Kewirausahaan AUC 0324 Samarinda
No ratings yet
Peluang Kewirausahaan AUC 0324 Samarinda
19 pages
Crime Mapping for Police Planning
No ratings yet
Crime Mapping for Police Planning
7 pages

Chapter 1

Uploaded by

Chapter 1

Uploaded by

Chapter 1

Questions and Answers: Chapter 1

Q1: What is the purpose of the discriminant function?

g(x) = w1 · petal length + w2 · petal width + w0

In higher dimensions, the equation extends to a linear combination of inputs:

Q11: How work a training of a linear classification model?

where α is the learning rate.

• In 2D space (where n = 2), the decision boundary is a line (1D).

Q13: Why is conditional probability important in machine learning?

Answer: The main components of a learning problem are:

• Input (X): The features used to describe each instance.

Example: Predicting if an email is spam

Q18: What is the Cocktail Party Algorithm?

Q19: Can you elaborate more on Error Functions?

Q20: What’s the effect of a non-random sample on the feasibility of learning?

where 1 represents heads and 0 represents tails.

According to Hoeffding’s Inequality:

Q30: Types of Learning in Machine Learning and their major differences?

• Data: Labelled (input-output pairs)

• Data: Feedback from actions (rewards or penalties)

Type Data Requirement Output Type Common Use Cases

Table 1.1: Comparison of Machine Learning Types

Q31: In the Perceptron Learning Algorithm, why do we multiply our error by x?

wi = wi−1 + learning rate · (ytrue − ypred )

wi = wi−1 + learning rate · (ytrue − ypred ) · x?

• If the point has features [x1 = 5, x2 = 0.1]

– w1 would be updated by 5 × 1 = 5 (a large adjustment)

w(t + 1) ← w(t) + learning rate · (ytrue − ypred )

• Adjust all weights equally regardless of the corresponding feature values.

You might also like