Unit 1
Unit 1
● E (Experience): Refers to past data, including labeled data (e.g., whether someone
is obese or not).
● T (Task): Refers to the specific task, such as classifying new data.
● P (Performance Measure): Measures how well the system performs, like the
accuracy of classifying obesity.
Examples
Supervised learning is a type of machine learning where the model is trained on a labeled
dataset. This means that each training example includes both the input data and the correct
output (or label). The goal is for the model to learn the mapping between inputs and outputs
so that it can predict the output for unseen data.
Key Characteristics:
● Labeled Data: Each training data point has a corresponding correct output (label).
● Learning Process: The algorithm learns by adjusting its parameters based on the
errors it makes during training. The performance is evaluated by comparing
predictions with the true output labels.
● Output: The model’s task is to predict an output based on new input data.
Example:
Algorithms Used:
● Decision Trees
● K-Nearest Neighbors (KNN)
● Support Vector Machines (SVM)
● Naive Bayes
2. Regression
● Definition: In regression, the output variable is continuous, meaning that the model
predicts a value within a range, such as a number.
● Goal: To predict a continuous value based on input features.
Example:
● Predicting the price of a house based on features like location, square footage, and
number of bedrooms.
● Estimating sales revenue based on past data.
Algorithms Used:
● Linear Regression
● Polynomial Regression
● Ridge Regression
● Lasso Regression
1. Training Phase:
○ The algorithm is given a labeled dataset consisting of input-output pairs.
○ It learns the relationship between the inputs and outputs, adjusting its model
parameters to minimize errors (e.g., through gradient descent or other
optimization techniques).
2. Prediction Phase:
○ After training, the model can make predictions on unseen data by applying
the learned relationship.
3. Evaluation:
○ The model’s performance is assessed using metrics like accuracy (for
classification) or mean squared error (for regression).
○ Cross-validation or separate test datasets are often used to ensure the model
generalizes well to new data.
1. Spam Detection:
○ Input: Features of an email (e.g., word frequency, sender, subject).
○ Output: Class label (spam or non-spam).
○ Algorithm: Naive Bayes classifier.
2. Credit Scoring:
○ Input: Information about an individual (e.g., income, credit history, loan
amount).
○ Output: A score or classification (e.g., good or bad credit risk).
○ Algorithm: Logistic Regression.
3. House Price Prediction:
○ Input: Features like location, size, and number of rooms.
○ Output: A continuous value representing the price of the house.
○ Algorithm: Linear Regression.
● Clear Objective: Since the data is labeled, the goal of learning is clear and
well-defined.
● Effective for Known Tasks: Supervised learning is effective when the input-output
relationship is known, and the problem is well-defined (e.g., classification,
regression).
● Wide Range of Applications: It is widely applicable in real-world scenarios like
speech recognition, image classification, and financial predictions.
Definition: Unsupervised learning is a type of machine learning in which models are trained
using unlabeled data. Unlike supervised learning, there are no corresponding output labels
in the data. The model is tasked with identifying patterns, structures, and relationships within
the data on its own.
Example: Given an input dataset containing images of cats and dogs, the algorithm will
group similar images together based on their features without knowing which image belongs
to which category (cat or dog). This process is done through clustering.
1. Discover Hidden Insights: It can reveal important patterns and structures within the
data.
2. Similarity to Human Learning: Just like humans learn from experience,
unsupervised learning algorithms adapt to data without labeled examples.
3. Works with Unlabeled Data: It can handle data that is unlabeled and uncategorized,
which is common in many real-world applications.
4. Real-World Relevance: In many cases, labeled data isn't available, making
unsupervised learning essential.
1. Input Data:
○ The model is provided with data that does not include any labels or
predefined outputs.
2. Pattern Recognition:
○ The algorithm tries to discover patterns or structures within the data. It may
look for similarities between data points (clustering) or attempt to reduce the
number of dimensions while preserving the essential data (dimensionality
reduction).
3. Output:
○ The result could be a set of clusters, reduced dimensions, or an identified
structure that helps in understanding the data better or preparing it for further
processing.
4. Evaluation:
1. Clustering: Grouping similar data points together based on their features. Examples:
○ K-Means Clustering
○ Hierarchical Clustering
○ DBSCAN
2. Association: Identifying relationships between variables in large datasets. Example:
○ Market Basket Analysis: "People who buy bread are likely to buy butter as
well."
● No Labeled Data Needed: Easier to get unlabeled data compared to labeled data.
● Ability to Handle Complex Tasks: Can be applied to complex problems where
supervised learning is not feasible.
● Discovering New Patterns: Helps in discovering unknown patterns and
relationships in data.
● Difficult to Evaluate: Without labeled data, it's hard to measure the model's
accuracy or performance.
● Less Accurate: Results may not be as precise as supervised learning due to lack of
labeled outputs.
● Higher Complexity: Understanding the results and finding the right algorithm can be
more challenging.
Unsupervised learning plays a crucial role in tasks like market analysis, anomaly detection,
and customer segmentation, where labeled data is scarce or unavailable.
Reinforcement Learning (RL)
1. Agent: The entity that interacts with the environment, makes decisions, and learns
from its actions.
2. Environment: The external system with which the agent interacts. The environment
provides feedback based on the agent's actions.
3. Actions: The decisions made by the agent that affect the environment.
4. States: The conditions or situations that describe the environment at any given time.
5. Rewards: Positive or negative feedback received from the environment after
performing an action. A higher reward signifies a good action, while a penalty is given
for bad actions.
6. Policy: A strategy that the agent uses to decide which actions to take in different
states.
7. Value Function: A function that estimates how good it is for an agent to be in a given
state, often used to predict long-term rewards.
Through repeated interaction with the environment, the agent optimizes its policy to
maximize the cumulative reward over time.
Example:
In a video game scenario, an RL agent might control a character that needs to avoid
obstacles and collect rewards. For each good action (like collecting a reward or avoiding an
obstacle), the agent receives a positive feedback signal (reward), and for each bad action
(such as hitting an obstacle), the agent receives a negative feedback (penalty). Over time,
the agent learns to maximize its total score by improving its decision-making.
Applications of Reinforcement Learning:
1. Game Playing: Training agents to play games (e.g., AlphaGo, chess, or video
games).
2. Robotics: Enabling robots to learn and improve their behavior in real-world tasks like
navigation and manipulation.
3. Autonomous Vehicles: Teaching self-driving cars to make decisions for safe and
efficient driving.
4. Finance: Optimizing trading strategies and investment portfolios.
5. Healthcare: Personalized treatment recommendations and drug discovery.
6. Advertising: Optimizing bidding strategies for digital ads.
● Autonomous Learning: The agent can learn without labeled data, making it
applicable to real-world situations where labeled data is hard to come by.
● Dynamic Adaptation: It adapts to changing environments through continuous
learning and feedback.
● Optimization of Long-Term Goals: Focuses on maximizing long-term cumulative
rewards rather than short-term gains.
Machine learning (ML) plays a crucial role in enabling AI systems to function autonomously,
adapt to new data, and improve over time. Here’s how ML fits into various AI applications:
● AI Task: Interpreting and analyzing visual information (image and video recognition).
● ML Role: ML models (like convolutional neural networks) are trained on large
datasets of images to detect objects, faces, and text in images. This is used in facial
recognition, object detection, and even medical imaging analysis.
4. Predictive Analytics
6. Fraud Detection
Definition:
PAC Learning is a framework in machine learning, introduced by Leslie Valiant, which aims
to formalize learning by ensuring that the learner finds a hypothesis that is "probably" correct
(with high confidence) and "approximately" accurate (within a small error margin).
Key Components:
1. Probably: The learner has a high probability (1−δ) of finding a good hypothesis.
2. Approximately: The hypothesis learned is close to the correct one, with a small error
(ϵ).
3. Correct: The hypothesis performs well on unseen data.
Goal:
The objective is to find a hypothesis h in the hypothesis space (H) such that:
Where:
1. Efficiency: The learning algorithm should run in polynomial time relative to the input
size and hypothesis complexity.
2. Sufficient Data: The learner must have enough training samples to meet the
specified error and confidence levels.
Example:
● The PAC learning algorithm ensures that with 95% confidence (1−δ=0.95), the
hypothesis has at most 5%error (ϵ=0.05) on unseen data.
Importance:
PAC Learning provides a theoretical foundation for understanding how much data and
computational resources are required to learn effectively, ensuring generalization from finite
data.
Version Spaces
Definition:
A version space is the subset of all possible hypotheses in the hypothesis space H that are
consistent with the observed training examples. It is used in concept learning to represent
the set of candidate hypotheses that correctly classify the training data.
Key Concepts:
A consistent hypothesis in machine learning is one that correctly predicts the target values
for all the training examples. In other words, a hypothesis is consistent if it perfectly matches
the training data, meaning there are no errors between the predicted and actual values in the
training set.
In other words, the hypothesis H correctly predicts the output yifor every input xiin the
training set.
To iteratively narrow down G and S as more training examples are provided, converging to
the target hypothesis.
Example:
Training Data:
● Attributes: Shape (Circle, Triangle), Color (Red, Blue), Size (Small, Large).
● Target Concept: Objects that are Red and Large.
Initial H:
Limitations:
Applications:
Hypothesis Space:
The hypothesis space (H) represents the set of all possible hypotheses (or functions) that a
learning algorithm can consider to map inputs to outputs for a given task. It is defined by:
● The representation language (e.g., decision trees, linear models, neural networks).
● The constraints or assumptions imposed on the hypothesis.
For example:
Inductive Bias:
Inductive bias refers to the set of assumptions a machine learning algorithm makes to
generalize from the training data to unseen examples. Without inductive bias, learning is
impossible since the model would have no preference for one hypothesis over another.
1. Restrictive Bias:
○ Reduces the hypothesis space by limiting the form of hypotheses.
○ Example: Linear regression assumes the data follows a linear relationship.
2. Preference Bias:
○ Considers all hypotheses but prefers some over others based on criteria like
simplicity or likelihood.
○ Example: Decision trees prefer smaller trees with fewer splits.
● The choice of hypothesis space and inductive bias directly impacts the performance,
generalization, and efficiency of a machine learning algorithm.
● Too restrictive bias limits flexibility; too broad a hypothesis space makes
generalization difficult. Finding the right balance is key to effective learning.
1.Find-S Algorithm
The Find-S (Find-Specific) algorithm is a simple method used in concept learning to find
the most specific hypothesis h that is consistent with the given training examples.
1. Initialize h:
○ Set the initial hypothesis h to the most specific hypothesis in the hypothesis
space, h=⟨∅,∅,…,∅⟩
2. Iterate through the training examples:
○ For each positive example:
■ Compare h with the example.
■ Generalize h minimally to include the example if it does not already.
3. Ignore negative examples:
○ The algorithm does not modify h for negative examples.
4. Output the final hypothesis h:
○ This is the most specific hypothesis consistent with all positive examples.
Example of Find-S Algorithm:
1.https://www.youtube.com/watch?v=O6vwN74aSGY&list=PL4gu8xQu0_5JBO1FKRO5p20
wc8DprlOgn&index=33
2.https://www.youtube.com/watch?v=SD6MQLC2DdQ&list=PL4gu8xQu0_5JBO1FKRO5p20
wc8DprlOgn&index=34
3.Training Data:
2. Candidate Elimination Algorithm
https://www.youtube.com/watch?v=l-Uk3jDFrWI&list=PL4gu8xQu0_5JBO1FKRO5p20wc8D
prlOgn&index=38
The Candidate Elimination Algorithm finds all hypotheses consistent with the training data
by maintaining both the General Boundary (G) and Specific Boundary (S).
1. Initialize S and G:
○ S: Set to the most specific hypothesis ⟨∅,∅,…,∅⟩
○ G: Set to the most general hypothesis ⟨?,?,…,?⟩
2. For each training example:
○ If the example is positive:
■ Remove hypotheses from G that do not cover the example.
■ Generalize S minimally to include the example, ensuring consistency
with G.
○ If the example is negative:
■ Remove hypotheses from S that cover the example.
■ Specialize G minimally to exclude the example, ensuring consistency
with S.
3. Repeat for all examples.
4. Output S and G:
○ The version space is the region between S and G.
Solved Problems:
1.https://www.youtube.com/watch?v=O2wYwFOMQ24&list=PL4gu8xQu0_5JBO1FKRO5p20
wc8DprlOgn&index=39
2.https://www.youtube.com/watch?v=VMoPY9Wimi4&list=PL4gu8xQu0_5JBO1FKRO5p20w
c8DprlOgn&index=40
3.https://www.youtube.com/watch?v=kGaR2PQfqlk&list=PL4gu8xQu0_5JBO1FKRO5p20wc
8DprlOgn&index=41
4.https://www.youtube.com/watch?v=8Cud5fmnvJQ&list=PL4gu8xQu0_5JBO1FKRO5p20wc
8DprlOgn&index=42
5.https://www.youtube.com/watch?v=Hr96fzShANk&list=PL4gu8xQu0_5JBO1FKRO5p20wc
8DprlOgn&index=43
6.https://www.youtube.com/watch?v=wrf4YuZA7Io&list=PL4gu8xQu0_5JBO1FKRO5p20wc8
DprlOgn&index=45
3.List-Then-Eliminate Algorithm
https://www.youtube.com/watch?v=_FMDyEoIX3A&list=PL4gu8xQu0_5JBO1FKRO5p20wc8
DprlOgn&index=37
● Overfitting occurs when the decision tree becomes overly complex and fits the
training data too closely, capturing noise or outliers.
● Causes:
○ Too many branches: Reflect anomalies or noise.
○ Excessive complexity: Results in poor generalization.
● Impact:
○ Poor accuracy on unseen data (test data).
1. Pruning:
○ Reduces tree complexity by removing irrelevant branches.
○ Improves model generalization and reduces overfitting.
○ Two types:
■ Pre-pruning (Early Stopping):
■ Stops tree growth before it fully classifies the data.
■ The current node becomes a leaf node if a stopping condition
is met.
■ Common criteria:
■ Minimum entropy or Gini Impurity threshold.
■ Minimum gain from splitting.
■ Maximum depth of the tree.
■ Minimum number of samples in a node.
■ Post-pruning:
■ Builds a complete tree and prunes nodes in a bottom-up
manner.
■ Replaces subtrees with leaf nodes if this reduces validation
error.
2. Regularization:
○ Use validation sets to determine optimal tree complexity.
Pre-Pruning Example:
Post-Pruning Process:
Bias and variance are two fundamental sources of error that help explain how well a
machine learning model can generalize to new, unseen data. Understanding the trade-off
between bias and variance is key to building effective models.
1. Bias
2. Variance
● Definition: Variance refers to the error introduced by the model’s sensitivity to small
changes in the training dataset. High variance indicates that the model is too
complex and learns not only the true underlying patterns but also the noise in the
training data.
● Causes of Variance:
○ The model has too many parameters (overfitting).
○ The model is highly flexible, allowing it to fit even the noise in the data.
○ Small fluctuations or variations in the training data result in large changes in
the model's predictions.
● Characteristics:
○ High Variance: The model performs very well on the training data but poorly
on new, unseen test data. This happens when the model learns to fit the
noise in the training data, resulting in overfitting.
○ Low Variance: The model’s predictions are stable and consistent across
different training sets. The model does not react strongly to small changes in
the data.
● Example:
○ A very deep decision tree with a large number of branches (high variance)
might perfectly classify the training data but fail to generalize to unseen data
because it has learned too much noise.
Bias-Variance Trade-off
There is a fundamental trade-off between bias and variance in machine learning models.
This trade-off dictates the model's ability to generalize to new data:
● Low Bias, Low Variance: This is the ideal scenario. The model is able to capture the
true patterns in the data without being overly influenced by noise. It generalizes well
to unseen data. However, achieving this perfect balance is often difficult.
● Low Bias, High Variance: This is an indication of overfitting. The model fits the
training data very well, including its noise and outliers, which results in high variance.
It performs poorly on new data because it fails to generalize.
● High Bias, Low Variance: This is an indication of underfitting. The model is too
simplistic to capture the true patterns of the data. Although the predictions may be
stable (low variance), they are consistently off the mark (high bias), resulting in poor
performance on both training and test data.
● High Bias, High Variance: This is the worst case. The model not only fails to
capture the true patterns (high bias) but also reacts inconsistently to fluctuations in
the training data (high variance). It leads to poor performance on both training and
test data.
● Underfitting: Occurs when the model is too simple to capture the underlying patterns
of the data, resulting in high bias and low variance. The model cannot learn enough
from the training data, which leads to poor performance on both training and test
datasets.
○ Example: Using a linear regression model to fit data that has a non-linear
relationship between input and output.
● Overfitting: Occurs when the model is too complex and learns not only the patterns
but also the noise or outliers in the training data, leading to low bias and high
variance. While it performs excellently on the training data, it fails to generalize to
new, unseen data.
○ Example: Using a very deep decision tree to classify data where the model
fits even the smallest noise points in the data.
StandardScaler is a feature scaling technique that standardizes data by transforming it to
have a mean of 0 and a standard deviation of 1. It uses the formula:
Where:
This scaling ensures that each feature contributes equally, improving the performance of
algorithms like K-Means, PCA, and regression models, especially those sensitive to feature
magnitudes.
COST function of Linear regression: