Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views49 pages

Unit I

The document outlines the syllabus for a Machine Learning course at MIT School of Computing, detailing key concepts such as the definition of machine learning, types of algorithms, and the importance of data in training models. It also discusses the differences between traditional programming and machine learning approaches, as well as essential components like bias, variance, and the bias-variance trade-off. Additionally, it highlights various applications of machine learning across different industries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views49 pages

Unit I

The document outlines the syllabus for a Machine Learning course at MIT School of Computing, detailing key concepts such as the definition of machine learning, types of algorithms, and the importance of data in training models. It also discusses the differences between traditional programming and machine learning approaches, as well as essential components like bias, variance, and the bias-variance trade-off. Additionally, it highlights various applications of machine learning across different industries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

MIT School of Computing

Department of Computer Science & Engineering

Third Year Engineering


23CSE3006 -MACHINE LEARNING
Class - T.Y. (SEM-II)
Unit I: Introduction to Machine Learning
Name Of the Course Coordinator:
Prof. Aarti Pimpalkar
Team Members
1. Prof. Dr. Nilima Kulkarni
2. Prof. Abhishek Das
3. Prof. Dattatray Kale
4. Prof. Nilesh Kulal

AY 2025-2026 SEM-II
MIT School of Computing
Department of Computer Science & Engineering

Syllabus :UNIT I 9 hours


• Introduction to machine learning;

• Applications, and motivation;

• programming approach vs. machine learning approach in Artificial Intelligence;

• components of a learning problem (such as data, model, and error functions);

• process of learning (training);

• testing, bias and variance error; accuracy, confusion-matrix,

• precision-recall; over-fitting; under-fitting;

• role of cross validation; regularization; bias-variance analysis


2
MIT School of Computing
Department of Computer Science & Engineering

Definition of Machine Learning:

• Machine learning is a branch of artificial intelligence (AI) and computer


science which focuses on the use of data and algorithms to imitate the way that
humans learn, gradually improving its accuracy.

• Definition by Arthur Samuel: Field of study that gives computers the capability
to learn without being explicitly programmed

• In a very simple terms, Machine Learning(ML) can be explained as automating


and improving the learning process of computers based on their experiences
without being actually programmed i.e. without any human assistance..
3
MIT School of Computing
Department of Computer Science & Engineering

Biggest Confusion: AI Vs ML Vs Deep Learning

4
MIT School of Computing
Department of Computer Science & Engineering

Machine Learning Terminology


• Algorithm: A Machine Learning algorithm is a set of rules and statistical
techniques used to learn patterns from data and draw significant
information from it.
• Model: A model is trained by using a Machine Learning Algorithm.
• Predictor Variable: It is a feature(s) of the data that can be used to predict
the output.
• Response Variable: It is the feature or the output variable that needs to be
predicted by using the predictor variable(s).
• Training Data: The Machine Learning model is built using the training data.
The training data helps the model to identify key trends and patterns
essential to predict the output.
• Testing Data: After the model is trained, it must be tested to evaluate how
accurately it can predict an outcome. This is done by the testing data set. 5
MIT School of Computing
Department of Computer Science & Engineering

How does Machine Learning work?

https://www.youtube.com/watc
h?v=yTwCA7H03cc

6
MIT School of Computing
Department of Computer Science & Engineering

Applications of Machine Learning

7
MIT School of Computing
Department of Computer Science & Engineering

Types of Machine Learning Algorithm (3 Types)


• Supervised Learning: Supervised learning is a technique in which we
teach or train the machine using data which is well labeled.

8
MIT School of Computing
Department of Computer Science & Engineering

Cont.…
• Unsupervised Learning : Unsupervised learning involves training by
using unlabeled data and allowing the model to act on that
information without guidance.

9
9
MIT School of Computing
Department of Computer Science & Engineering

Cont.…
• Reinforcement Learning is a
part of Machine learning where
an agent is put in an
environment and he learns to
behave in this environment by
performing certain actions and
observing the rewards which it
gets from those actions.

10
MIT School of Computing
Department of Computer Science & Engineering

WHY IS MACHINE LEARNING IMPORTANT?


• Data is the lifeblood of all business. Data-driven decisions increasingly make the difference
between keeping up with competition or falling further behind. Machine learning can be the
key to unlocking the value of corporate and customer data and enacting decisions that keep a
company ahead of the competition.

• MACHINE LEARNING USE CASES


1. Manufacturing. Predictive maintenance and condition monitoring
2. Retail. Upselling and cross-channel marketing
3. Healthcare and life sciences. Disease identification and risk satisfaction
4. Travel and hospitality. Dynamic pricing
5. Financial services. Risk analytics and regulation
6. Energy. Energy demand and supply optimization
11
MIT School of Computing
Department of Computer Science & Engineering

• Why Machine Learning

12
MIT School of Computing
Department of Computer Science & Engineering

13
MIT School of Computing
Department of Computer Science & Engineering

• Applications of machine learning.


1. In the retail business, machine learning is used to study consumer behavior.
2. In finance, banks analyze their past data to build models to use in credit applications, fraud detection, and
the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing the quality of
service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast enough by
computers. The World Wide Web is huge; it is constantly growing and searching for relevant information
cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the system
designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer correctly
when driving on a variety of roads.
14
MIT School of Computing
Department of Computer Science & Engineering

programming approach vs. machine learning approach in Artificial Intelligence


• Approach to Problem Solving:
• Traditional Programming: In traditional programming, a programmer writes explicit rules or
instructions for the computer to follow. These rules dictate exactly how the computer should process
input data to produce the desired output. It requires a deep understanding of the problem and a clear
way to encode the solution in a programming language.

• Machine Learning: In machine learning, instead of writing explicit rules, a programmer trains a
model using a large dataset. The model learns patterns and relationships from the data, enabling it to
make predictions or decisions without being explicitly programmed for each possibility. This approach
is particularly useful for complex problems where defining explicit rules is difficult or impossible.
• Data Dependency:
• Traditional Programming: Relies less on data. The quality of the output depends mainly on the logic
defined by the programmer.
• Machine Learning: Heavily reliant on data. The quality and quantity of the training data significantly
impact the performance and accuracy of the model.
15
MIT School of Computing
Department of Computer Science & Engineering

• Flexibility and Adaptability:


• Traditional Programming: Has limited flexibility. Changes in the problem domain
require manual updates to the code.
• Machine Learning: Offers higher adaptability to new scenarios, especially if the model
is retrained with updated data.
• Problem Complexity:
• Traditional Programming: Best suited for problems with clear, deterministic logic.
• Machine Learning: Better for dealing with complex problems where patterns and
relationships are not evident, such as image recognition, natural language processing, or
predictive analytics.
• Development Process:
• Traditional Programming: The development process is generally linear and
predictable, focusing on implementing and debugging predefined logic.
• Machine Learning: Involves an iterative process where models are trained, evaluated,
and fine-tuned. This process can be less predictable and more experimental. 16
MIT School of Computing
Department of Computer Science & Engineering

Components of a learning problem


The learning process, whether by a human or a machine, can be divided into four
components, namely, data storage, abstraction, generalization, and evaluation. The figure
illustrates the various components and the steps involved in the learning process.

17
MIT School of Computing
Department of Computer Science & Engineering

Machine Learning High Level Steps


1. Collecting Data
2. Preparing the Data
3. Choosing a Model
4. Training the Model
5. Evaluating the Model
6. Parameter Tuning
7. Making Predictions

18
MIT School of Computing
Department of Computer Science & Engineering

What Is Training Data?

Training data (or a training dataset) is the initial data used to train machine learning models.
Training datasets are fed to machine learning algorithms to teach them how to make
predictions or perform a desired task.

What Is the Difference Between Training Data and Testing Data?

Training data is the initial dataset you use to teach a machine learning application to
recognize patterns or perform to your criteria, while testing or validation data is used to
evaluate your model’s accuracy.

19
MIT School of Computing
Department of Computer Science & Engineering

Training, Testing, and Validation Sets


• The training set to actually train the
algorithm
• The validation set to keep track of how
well it is doing as it learns
• The test set to produce the final
results.
Why use both validation and testing?
• Validation helps improve a model's
performance before testing
• Testing provides a realistic assessment
of a model's performance

20
MIT School of Computing
Department of Computer Science & Engineering
What is Cross-Validation?
• Cross validation is a technique used in machine learning to evaluate the performance of a model
on unseen data. It involves dividing the available data into multiple folds or subsets, using one
of these folds as a validation set, and training the model on the remaining folds. This process is
repeated multiple times, each time using a different fold as the validation set. Finally, the results
from each validation step are averaged to produce a more robust estimate of the model’s
performance.
What is cross-validation used for?
• The main purpose of cross validation is to prevent overfitting, which occurs when a model is
trained too well on the training data and performs poorly on new, unseen data. By evaluating the
model on multiple validation sets, cross validation provides a more realistic estimate of the
model’s generalization performance, i.e., its ability to perform well on new, unseen data.
Types of Cross-Validation
• There are several types of cross validation techniques, including k-fold cross validation,
leave-one-out cross validation, and Holdout validation, Stratified Cross-Validation. The
choice of technique depends on the size and nature of the data, as well as the specific
requirements of the modeling problem. 21
MIT School of Computing
Department of Computer Science & Engineering

CROSS VALIDATION (K-FOLD)


Cross-Validation (we will refer to as CV from here on)is a technique
used to test a model’s ability to predict unseen data, data not used to
train the model. CV is useful if we have limited data when our test set
is not large enough. There are many different ways to perform a CV. In
general, CV splits the training data into k blocks. In each iteration, the
model trains on k-1 blocks and is validated using the last block. We can
use multiple iterations of CV to reduce variability. We use the average
error over all the iterations to evaluate the model.
It is always preferred to use a model with better CV performance.
Similarly, we can also use CV to tune model parameters.

K-fold Cross-Validation Steps:


1.Split training data into K equal parts
2.Fit the model on k-1 parts and calculate test error using the fitted model
on the kth part
3.Repeat k times, using each data subset as the test set once. (usually k=
5~20)
22
MIT School of Computing
Department of Computer Science & Engineering

What is Bias?

• The bias is known as the difference between the prediction of the values by
the Machine Learning model and the correct value.
• Being high in biasing gives a large error in training as well as testing data.
• It recommended that an algorithm should always be low-biased to avoid the
problem of underfitting.
• By high bias, the data predicted is in a straight line format, thus not fitting
accurately in the data in the data set. Such fitting is known as
the Underfitting of Data.

23
MIT School of Computing
Department of Computer Science & Engineering

VARIANCE
•When a model performs well on the training data but poorly on
unseen (test or validation) data, it indicates that the model has high
variance.
•It basically tells how scattered the predicted values are from the
actual values.
•The variability of model prediction for a given data point which tells
us the spread of our data is called the variance of the model.
•The model with high variance has a very complex fit to the training
data and thus is not able to fit accurately on the data which it hasn’t
seen before.
•As a result, such models perform very well on training data but have
high error rates on test data. When a model is high on variance, it is
then said to as Overfitting of Data.
24
MIT School of Computing
Department of Computer Science & Engineering

▪Bias: Error in training data and Testing Data


▪Variance: Error in test data

25
MIT School of Computing
Department of Computer Science & Engineering

26
MIT School of Computing
Department of Computer Science & Engineering

Bias-Variance Trade-Off
• While building the machine learning model, it is really important to take care
of bias and variance in order to avoid overfitting and underfitting in the
model.
• If the model is very simple with fewer parameters, it may have low variance
and high bias.
• Whereas, if the model has a large number of parameters, it will have high
variance and low bias.
• So, it is required to make a balance between bias and variance errors, and this
balance between the bias error and variance error is known as the
Bias-Variance trade-off.
27
MIT School of Computing
Department of Computer Science & Engineering

For an accurate prediction of the model, algorithms need a low variance


and low bias. But this is not possible because bias and variance are
related to each other:
•If we decrease the variance, it will increase the bias.
•If we decrease the bias, it will increase the variance.
Bias-Variance trade-off is a central issue in supervised learning. Ideally,
we need a model that accurately captures the regularities in training data
and simultaneously generalizes well with the unseen dataset.
Unfortunately, doing this is not possible simultaneously. Because a high
variance algorithm may perform well with training data, but it may lead
to overfitting to noisy data. Whereas, high bias algorithm generates a
much simple model that may not even capture important regularities
in the data. So, we need to find a sweet spot between bias and variance
to make an optimal model.
•Bias-Variance trade-off is about finding the sweet spot to make a balance
between bias and variance errors. 28
MIT School of Computing
Department of Computer Science & Engineering

• Confusion Matrix?
A Confusion matrix is an N x N matrix used for evaluating the performance of a classification
model, where N is the total number of target classes. The matrix compares the actual target
values with those predicted by the machine learning model.

29
MIT School of Computing
Department of Computer Science & Engineering

Type I and Type II Errors:


•Type I and Type II errors are very common in machine learning and statistics.
•Type I error occurs when the Null Hypothesis (H0) is mistakenly rejected. This is also
referred to as the False Positive Error.
•Type II error occurs when a Null Hypothesis that is actually false is accepted. This is
also referred to as the False Negative Error.
There are two errors that could potentially occur:

• Type I error (false positive): the test result


says you have coronavirus, but you actually
don’t.

• Type II error (false negative): the test result


says you don't have coronavirus, but you 30
MIT School of Computing
Department of Computer Science & Engineering

• Important Terms in a Confusion Matrix

1. True Positive (TP) : The predicted value matches the actual value, or the predicted class matches the actual
class.
The actual value was positive, and the model predicted a positive value.
2. True Negative (TN) : The predicted value matches the actual value, or the predicted class matches the actual
class.
The actual value was negative, and the model predicted a negative value.

3. False Positive (FP) – Type I Error : The predicted value was falsely predicted.
• The actual value was negative, but the model predicted a positive value.

4. False Negative (FN) – Type II Error : The predicted value was falsely predicted.
• The actual value was positive, but the model predicted a negative value.

31
MIT School of Computing
Department of Computer Science & Engineering

What is accuracy?
Accuracy is a metric that measures how often a machine learning model correctly predicts the
outcome. You can calculate accuracy by dividing the number of correct predictions by the total
number of predictions.

32
MIT School of Computing
Department of Computer Science & Engineering

What is precision?
• Precision is a metric that measures how often a machine learning model correctly predicts the positive class.
You can calculate precision by dividing the number of correct positive predictions (true positives) by the total
number of instances the model predicted as positive (both true and false positives).

What is recall?
• Recall is a metric that measures how often a machine learning model correctly identifies positive instances
(true positives) from all the actual positive samples in the dataset. You can calculate recall by dividing the
number of true positives by the number of positive instances. The latter includes true positives (successfully
identified cases) and false negative results (missed cases).

33
MIT School of Computing
Department of Computer Science & Engineering

F1 SCORE
F1 score is a weighted average of the recall (sensitivity) and precision.

34
MIT School of Computing
Department of Computer Science & Engineering

A classification model predicts whether a customer will churn


(leave the service). The model's predictions on a test set are
evaluated against the actual outcomes, resulting in the following
confusion matrix: Based on this confusion matrix, calculate the
following Level 2 metrics:
1.Accuracy 2. Precision 3. Recall 4. F1-Score

35
MIT School of Computing
Department of Computer Science & Engineering

A bank's machine learning model is designed to detect fraudulent credit card transactions.
A test set of 200 transactions is used to evaluate the model's performance.
•Of the 150 genuine transactions, the model correctly identified 135 as genuine.
•Of the 50 fraudulent transactions, the model correctly identified 40 as fraudulent.
Based on this information, calculate the following metrics:
1. Accuracy 2. Precision for fraudulent transactions
3. Recall for fraudulent transactions
4. F1-Score for fraudulent transactions

Construct the Confusion Matrix


First, let's identify the four key values (TP, TN, FP, FN) from the provided information.
•True Positive (TP): The model correctly predicted fraudulent transactions.
• Given: 40 fraudulent transactions were correctly identified.
• TP=40
•True Negative (TN): The model correctly predicted genuine transactions.
• Given: 135 genuine transactions were correctly identified.
• TN=135
•False Positive (FP): The model incorrectly predicted genuine transactions as
fraudulent.
• Calculation: Total genuine transactions - correctly identified genuine
transactions.
• FP=150−135=15
•False Negative (FN): The model incorrectly predicted fraudulent transactions as
genuine.
• Calculation: Total fraudulent transactions - correctly identified fraudulent
transactions. 36
• FN=50−40=10
MIT School of Computing
Department of Computer Science & Engineering

What is underfitting
• Underfitting occurs when a model is not able to make accurate predictions based on
training data and hence, doesn’t have the capacity to generalize well on new data.
Another case of underfitting is when a model is not able to learn enough from training
data (Fig), making it difficult to capture the dominating trend (the model is unable to
create a mapping between the input and the target variable).

37
MIT School of Computing
Department of Computer Science & Engineering

What is overfitting
• A model is considered overfitting when it does extremely well on training
data but fails to perform on the same level on the validation data (like the
child who memorized every math problem in the problem book and would
struggle when facing problems from anywhere else). An overfitting model fails
to generalize well, as it learns the noise and patterns of the training data to the
point where it negatively impacts the performance of the model on new data
(fig).

38
MIT School of Computing
Department of Computer Science & Engineering

Reasons for Under fitting


• The model is too simple, So it may be not capable to represent the complexities in the
data.
• The input features which is used to train the model is not the adequate representations of
underlying factors influencing the target variable.
• The size of the training dataset used is not enough.
• Excessive regularization are used to prevent the overfitting, which constraint the model to
capture the data well.
• Features are not scaled.
Reasons for Overfitting:
• High variance and low bias.
• The model is too complex.
• The size of the training data.
39
MIT School of Computing
Department of Computer Science & Engineering

How to avoid Overfitting and Underfitting?


• To solve the issue of overfitting and underfitting, it is important to choose an
appropriate model for the given dataset.
• Hyper-performance tuning can also be performed.

• For overfitting reducing the model complexity can help similarly for
underfitting the model complexity can be increased.
• As overfitting is caused due to too many features in the dataset and
underfitting is caused by too few features so during feature engineering
the numbers of features can be decreased and increased to avoid overfitting
and underfitting respectively.

40
MIT School of Computing
Department of Computer Science & Engineering

What is Regularization in Machine Learning?


• Regularization refers to techniques that are used to calibrate machine learning
models in order to minimize the adjusted loss function and prevent overfitting
or underfitting.

41
MIT School of Computing
Department of Computer Science & Engineering

Regularization Techniques
1. Ridge Regularization :
It modifies the over-fitted or under fitted models by adding the penalty equivalent to the
sum of the squares of the magnitude of coefficients.
This means that the mathematical function representing our machine learning model is
minimized and coefficients are calculated. The magnitude of coefficients is squared and
added. Ridge Regression performs regularization by shrinking the coefficients present. The
function depicted below shows the cost function of ridge regression :

42
MIT School of Computing
Department of Computer Science & Engineering

2. Lasso Regression:
• It modifies the over-fitted or under-fitted models by adding the penalty equivalent to the sum
of the absolute values of coefficients.
• Lasso regression also performs coefficient minimization, but instead of squaring the
magnitudes of the coefficients, it takes the true values of coefficients. This means that the
coefficient sum can also be 0, because of the presence of negative coefficients. Consider the
cost function for Lasso regression :

43
MIT School of Computing
Department of Computer Science & Engineering

44
MIT School of Computing
Department of Computer Science & Engineering

45
MIT School of Computing
Department of Computer Science & Engineering

What is Feature Engineering?


• Feature engineering is the process of turning raw data into useful features that help improve the performance of
machine learning models. It includes choosing, creating and adjusting data attributes to make the model’s predictions
more accurate. The goal is to make the model better by providing relevant and easy-to-understand information.
• A feature or attribute is a measurable property of data that is used as input for machine learning algorithms. Features can
be numerical, categorical or text-based representing essential data aspects which are relevant to the problem. For
example in housing price prediction, features might include the number of bedrooms, location and property age.

46
MIT School of Computing
Department of Computer Science & Engineering

Importance of Feature Engineering


Feature engineering can significantly influence model performance. By refining features, we can:
•Improve accuracy: Choosing the right features helps the model learn better, leading to more accurate
predictions.
•Reduce overfitting: Using fewer, more important features helps the model avoid memorizing the data
and perform better on new data.
•Boost interpretability: Well-chosen features make it easier to understand how the model makes its
predictions.
•Enhance efficiency: Focusing on key features speeds up the model’s training and prediction process,
saving time and resources.

47
MIT School of Computing
Department of Computer Science & Engineering

Steps in Feature Engineering


Feature engineering can vary depending on the specific problem but the general steps are:
Data Cleansing: Identify and correct errors or inconsistencies in the dataset to ensure data
quality and reliability.
Data Transformation: Transform raw data into a format suitable for modeling including
scaling, normalization and encoding.
Feature Extraction: Create new features by combining or deriving information from
existing ones to provide more meaningful input to the model.
Feature Selection: Choose the most relevant features for the model using techniques like
correlation analysis, mutual information and stepwise regression.
Feature Iteration: Continuously refine features based on model performance by adding,
removing or modifying features for improvement.

48
MIT School of Computing
Department of Computer Science & Engineering

1. Feature Creation: Feature creation involves generating new features


from domain knowledge or by observing patterns in the data. It can be:
•Domain-specific: Created based on industry knowledge likr business rules.
•Data-driven: Derived by recognizing patterns in data.
•Synthetic: Formed by combining existing features.

2. Feature Transformation: Transformation adjusts features to improve


model learning:
•Normalization & Scaling: Adjust the range of features for consistency.
•Encoding: Converts categorical data to numerical form i.e one-hot
encoding. 4. Feature Selection: Feature selection involves choosing a subset
•Mathematical transformations: Like logarithmic transformations for of relevant features to use:
skewed data. •Filter methods: Based on statistical measures like correlation.
•Wrapper methods: Select based on model performance.
3. Feature Extraction: Extracting meaningful features can reduce •Embedded methods: Feature selection integrated within model
dimensionality and improve model accuracy: training.
•Dimensionality reduction: Techniques like PCA reduce features while
preserving important information. 5. Feature Scaling: Scaling ensures that all features contribute
•Aggregation & Combination: Summing or averaging features to simplify equally to the model:
the model. •Min-Max scaling: Rescales values to a fixed range like 0 to 1.
•Standard scaling: Normalizes to have a mean of 0 and variance
of 1.

49

You might also like