Topic: What is Human Learning?
Sub-topic: Definition, Need, and Types
Presented by: Navalan M
Course: B.Tech / Ph.D Coursework
Date: [Insert Date]
Slide 1: What is Human Learning?
In cognitive science, learning is the process of gaining information through observation.
It helps us:
o Walk safely
o Do homework correctly
o Launch rockets with accurate angles
📌 Learning improves performance over time
Slide 2: Why Do We Need to Learn?
Daily life requires decision-making and actions
To do a task properly, we need:
o Prior information
o Experience
📈 More learning = Higher efficiency
🧠 Example:
More math practice → fewer mistakes
More rocket launches → better safety & precision
Slide 3: Types of Human Learning
Human learning happens in 3 ways:
Type Description
1. Expert Guidance Direct teaching from others
2. Guided by Past Knowledge Apply what was learned before
3. Self-Learning Learn by doing, trial & error
Slide 4: 1. Learning Under Expert Guidance
👶 Example: Baby learning words and colors from parents
🏫 School: Learning alphabets, math, grammar
🎓 College: Engineering, medicine, law
👨💼 Job: On-the-job training from seniors
📌 Mentors or teachers help because they have experience and subject knowledge
Slide 5: 2. Learning Guided by Past Knowledge
🧠 Use past knowledge to make new decisions
📌 Examples:
Baby groups objects by color without being told
Student identifies verbs from a sentence
Professional targets right customers based on old campaign data
⚠️No direct teaching involved, but learning is influenced by past expert input
Slide 6: 3. Self-Learning (Learning by Self)
Learn through experience, mistakes, and practice
Examples:
o Baby learning to walk through obstacles
o Child learning to ride a cycle
o Adult learning to drive a car
📌 We form personal checklists based on success/failure
Slide 7: Summary
✔️Human learning is essential for growth, efficiency, and problem-solving
✔️Learning can be:
Direct (from experts)
Indirect (from past lessons)
Experiential (from our own actions)
Slide 8: Questions & Discussion
💬 Let’s discuss your own real-life examples of the 3 types of learning.
Learning by Self (Trial and Error)
Example (Your Life):
The first time you used new software (like SPSS or Python for data analysis), you learned by clicking
buttons, making mistakes, and gradually understanding how to run tests. No one taught you every
step.
Student-friendly example:
A student learns how to ride a bicycle. They fall a few times, but finally succeed by trying again and
again.
What is Machine Learning? — Explained Simply
❓ Do Machines Really Learn?
Yes, machines can learn, but not the way humans do. In machine learning (ML), "learning" means:
A machine improves its performance on a task using past data (experience).
This idea is formalized in a famous definition by Tom M. Mitchell (Carnegie Mellon University):
🧠 “A computer program is said to learn from experience E, with respect to some task T and
performance measure P, if its performance on T, as measured by P, improves with experience E.”
🔄 Understanding the Definition with Examples
✅ Example 1: Playing Checkers
E (Experience): Playing many games in the past
T (Task): Playing checkers
P (Performance): % of games won
If the machine wins more games over time, it means it is learning.
✅ Example 2: Image Classification
E: Labeled images (e.g., cats, dogs, birds)
T: Predicting the correct label for a new image
P: Accuracy – how many predictions are correct
More correct classifications over time = better learning.
🧠 How Do Machines Learn?
Machine learning follows three major steps:
1. Data Input
o Machine receives historical data (e.g., exam scores, photos, sensor readings)
2. Abstraction
o It extracts patterns or "knowledge" from data
o Just like students summarize notes to focus on key points
3. Generalization
o It applies what it learned to new unseen data
o Like a student solving a new question based on concepts, not memorization
🤝 Analogy with Human Learning
Let’s compare to human learning:
Human Learning Machine Learning
Reads textbook Inputs data
Highlights key points Extracts features/patterns
Solves new problems in exam Predicts on new data
📘 Example: Animal Grouping (Human to ML Analogy)
Humans don’t memorize all animals. Instead, we group animals:
Mammals: Warm-blooded, have fur
Birds: Lay eggs, have feathers
Fishes: Live in water, lay eggs
→ So we generalize:
"If an animal has feathers and lays eggs, it's likely a bird."
Abstraction in Machine Learning
🧠 What is Abstraction?
When we feed input data to a machine, it is in raw form — like exam marks, weather
reports, or bank transactions.
This raw data cannot be used directly. It must be converted into a pattern or model — this
process is abstraction.
The model is the machine’s knowledge created from data.
🔧 Model = Summarized Knowledge
This model could be:
If/else rules (like: if salary > ₹50,000, mark as high-income)
Mathematical equations (like: y = c₀ + c₁x)
Trees/graphs (used in decision trees, social networks)
Groupings (like clustering people based on age/income)
👨💻 Who chooses the model?
Humans do. The choice depends on:
Type of problem
(e.g., prediction, classification, trend analysis)
Input data nature
(e.g., numeric/categorical, missing values, size)
Domain importance
(e.g., fraud detection needs fast, accurate results)
🧪 Example: Fitting a Model
Suppose our model is:
y = c₀ + c₁x (Linear Regression)
We must find the best values for c₀ and c₁ from training data.
This is called training the model.
Once values are found → the model can now predict outcomes for new x values.
📘 2. Generalization
🤔 What is Generalization?
After training, the model is tested on new, unseen data.
If it performs well → it is said to generalize well.
Training = Learn from past
Generalization = Perform on future
❌ What Can Go Wrong?
1. Overfitting – model is too perfect on training data, fails on test data.
(like a student who memorizes the guide but can’t solve new questions)
2. Test data is too different from training data.
The model struggles, just like a human in a totally new situation.
🧠 Generalization is Like Human Gut-Feeling
Sometimes, even humans make decisions based on intuition or past experience.
E.g., A doctor identifies a rare disease based on symptoms they've only seen once.
Similarly, ML models may not always give perfect answers — they make educated guesses.
📘 3. Well-Posed Learning Problem (Tom Mitchell’s View)
To check if a problem is right for machine learning, ask:
✅ Step 1: What is the Problem?
Example:
I want a program to predict the next word I type.
Use Tom Mitchell's format:
T (Task): Predict next word
E (Experience): Past data (English text corpus)
P (Performance): Prediction accuracy
Also write:
Assumptions: Language is English, words follow grammar
Similar Problems: Next-song suggestion, product recommendation
✅ Step 2: Why Solve It?
Motivation: Improves typing speed, assists in writing
Solution Benefits: Saves time, improves productivity
Use Case: Used in mobile keyboards (like Gboard, SwiftKey)
✅ Step 3: How to Solve It?
Describe steps like:
1. Collect a dataset of typed sentences
2. Clean and prepare the data
3. Train a model using NLP techniques
4. Use model to predict next word
5. Test on unseen text
🧾 Summary Chart
Term Meaning
Abstraction Convert raw data into a usable pattern (model)
Model Summarized form of data (equation, tree, rules, etc.)
Training Fitting the model to past (training) data
Generalization Using model on new data and still making correct predictions
Overfitting Model memorizes too much, can't work on new data
Well-posed Problem Task clearly defined by Task (T), Experience (E), and Performance (P)
This process has 3 parts:
✅ 1. Data Input (Give past examples to the computer)
We give old data to the computer.
🧒 Example: You tell the computer:
Fruit Color Shape Is Apple?
Red Round Yes
Green Round Yes
Yellow Long No
Fruit Color Shape Is Apple?
This is called training data. The computer will learn from this.
🔁 2. Abstraction (Computer finds patterns inside)
Now the computer thinks like this:
“Oh! Apples are usually red or green and round.
Bananas are yellow and long.”
So the computer builds a rule in its brain.
It does not just remember your table — it finds a general rule.
📈 3. Generalization (Use that rule on new data)
Now you give new fruit info:
🔸 Color: Red, Shape: Round
Computer says ➡️“Yes, this is probably an apple!”
This is called Generalization:
Using what it learned to make a smart guess about new things.
🍎 A Real-Life Example (Fruits)
You are the “machine” in this case:
You see many fruits.
You observe their color, shape.
You learn: Red & round → usually apple.
Next time someone gives you a red round fruit, you say: “This is probably an apple.”
✅ Types of Machine Learning
Machine Learning is mainly divided into 3 types:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
Let’s go one by one with real-world examples.
Supervised learning – Also called predictive learning. Model can learn from labelled data.
Unsupervised learning – Also called descriptive learning
Reinforcement learning :model learn through trial and error by interacting witn an environmental
and receiving reward or penalty based on its action
🟦 1. Supervised Learning
📌 Meaning:
In this method, the computer is given both the input and the correct output during training. It learns
by comparing its answer to the correct answer.
🧠 Think of it like a teacher supervising a student.
Predicting whether a tumour is malignant or benign
Predicting the price of domains like real estate, stocks, etc.
Are these two problems same in nature? The answer is ‘no’. Though both of them are prediction
problems, in one case we are trying to predict which category or class an unknown data belongs to
whereas in the other case we are trying to predict an absolute value and not a class. When we are
trying to predict a categorical or nominal variable, the problem is known as a classification problem.
Whereas when we are trying to predict a real-valued variable, the problem falls under the category
of regression.
test data is used to evaluate how the model performed based on unseen data
the machine should put an image of unknown category, also called a test data
Training data
depends on the information it gets from the past data, which we have called as training data,
modification can be done
classification problems include:
Image classification
Prediction of disease
Win–loss prediction of games
Prediction of natural calamity like earthquake, flood, etc.
Recognition of handwriting
Typical applications of regression can be seen in y=ax+b , x is predicited variable , y is target
variable
Demand forecasting in retails
Sales prediction for managers
Price prediction in real estate
Weather forecast
Skill demand forecast in job market
2. Unsupervised Learning. Unsl referred as pattern discovery or knowledge discovery.
Meaning:
Here, only input data is given – no answers. The computer tries to find patterns or groupings on its
own.
Think of it like a student learning without a teacher.
A. Clustering . group or organize similar objects together. One of the most commonly adopted
similarity measure is distance.
Customer A buys: Milk, Bread, Butter
Customer B buys: Milk, Bread, Butter
Customer C buys: Soap, Shampoo
Customer A and B shopping style is similar → distance small → same cluster.
Customer C is different → distance large → different cluster.
Age difference
Shopping item difference
Color or shape difference
When data values are very similar, the “distance” is low.
When they are very different, the “distance” is high.
B association analysis. As a part of association analysis, the association between data elements is
identified
Association means: “if this happens, that also happens”
→ It finds patterns like:
“If item A is present, item B is also likely to be present.”
If many people buy bread, they also buy butter.
→ This is an association.
Example 1: Market Segmentation
A shopping website collects data like:
o Age, spending habits, items bought
It finds that there are 3 customer groups:
o Young buyers
o Senior buyers
o Bulk buyers
In a supermarket:
o “If a person buys milk and cereal, they also likely buy sugar.”
The shop can then:
o Place those items near each other
o Give combo offers (boosts sales)
🔹 Applications:
Online shopping recommendations (e.g., Amazon: “People who bought this also
bought...”)
Product placement strategies in stores
📌 Simple Comparison:
Feature Clustering Association Analysis
Goal Group similar items Find items that occur together
Input Data Unlabeled Unlabeled
Output Groups (Clusters) Rules (like A → B)
Real-life Example Customer segmentation Market basket, product recommendation
3. Reinforcement Learning
Meaning:
The computer learns by doing. It takes actions, and gets rewards or punishments (like
points).
Like training a dog with rewards.
💡 Example 1: Playing a Game (Chess, Mario, etc.)
The computer tries a move.
If it wins → reward.
If it loses → penalty.
Over time, it learns the best way to play.
Example 2: Self-driving Car
The car gets a reward for safe driving.
Gets penalty for accidents or going off track.
Used for:
Robotics
Game playing (AlphaGo)
Self-driving cars
Summary Table:
Output Data
Type Input Data Goal Example
Given?
Predict correct
Supervised Learning Yes Yes Email spam detection
answer
Unsupervised Find hidden
Yes No Customer grouping
Learning patterns
Reinforcement Through Learn by trial & Playing games, robot
Reward/Penalty
Learning actions error navigation
problems That Should NOT Be Solved Using Machine Learning
A. Simple Rule-Based Logic
Example: Calculating area of a circle (Use formula, not ML)
Converting Celsius to Fahrenheit ((C × 9/5) + 32)
Example 3: Checking if a number is even or odd
These are deterministic — no need to "learn" them using data.
B. Small Datasets
ML needs data to learn patterns. Very small datasets don’t justify using ML.
Example 1: Trying to predict student marks using data of only 4 students
Example 2: Predicting disease from only 5 patient records
C. High-Risk Situations with Uncertainty
Reason: Machine Learning is often a black box. In critical areas, we need 100% trust and
explanation.
Example 1: Deciding cancer treatment without human doctor review
Example 2: Controlling nuclear reactors or air traffic
Example 3: Deciding court judgments or legal punishments
In such cases, ML can assist, but should not decide.
D. When Data Doesn’t Change Over Time
If conditions are always the same, rule-based solutions are better.
When Outcome is Already 100% Known and Fixed
Reason: No uncertainty or variation in output.
Example 1: Traffic light rules (Red = stop, Green = go)
Tasks Requiring Common Sense / Moral Judgment
Reason: ML lacks emotions and ethical thinking.
Example 1: Deciding who to save in a car accident
Example 2: Choosing a fair punishment in a family dispute
Needs human judgment, not just pattern recognition.
Issues in Machine Learning While Preparing to Model
Issue No. Issue Explanation Example
Data has blank/null Age is missing for
1 Missing Values entries, which 20% of survey
confuses the model. respondents.
Issue No. Issue Explanation Example
One class 95% non-fraud, 5%
dominates, model fraud – model may
2 Imbalanced Data
ignores minority predict “non-fraud”
class. always.
Real-Life Example: Fraud
Detection
Label (Fraud
Transaction
or Not)
₹1000 Not Fraud
₹2500 Not Fraud
₹980 Not Fraud
₹8000 Fraud
₹300 Not Fraud
₹450 Not Fraud
👉 Out of 1000 transactions:
950 are Not Fraud
50 are Fraud
Now, if a model always
predicts "Not Fraud" for
every transaction:
🤖 What the model sees:
950 correct
predictions (Not
Fraud)
50 wrong
predictions (missed
all Fraud)
Model memorizes 100% accuracy on
3 Overfitting training data but training data but only
can’t generalize. 60% on test data.
4 Underfitting Model is too simple Predicting stock
to capture trends. prices using only day
Issue No. Issue Explanation Example
of week.
Feature Selection
Why is it Important?
If you give the model
too many features,
especially irrelevant
ones, it gets
confused.
This leads to lower
accuracy and longer
training time. Irrelevant or too
many features Including email ID in
Why is "Email ID" bad? reduce accuracy. spam prediction adds
noise.
It is unique for every
person.
It has no relation to
spam or not spam.
The model may start
to memorize IDs, not
learn patterns.
If a new email ID
comes, the model
fails.
Height in cm vs
Features with
weight in kg — scale
6 Data Scaling different units
difference confuses
affect performance.
model.
Unwanted or Text with spelling
7 Noise in Data incorrect data mistakes or repeated
disturbs learning. values.
Incorrect labels
"Dog" image labeled
8 Label Errors reduce model
as "Cat".
accuracy.
🔄 Additional Issues (with Examples)
Issue
Issue Explanation Example
No.
Multicollinearity
Multicollinearity happens when two or
more input features (columns) are highly
correlated — meaning, they carry the Two or more features Height and leg length –
same information. are highly both increase
correlated. together.
The model receives duplicate signals,
9 which causes confusion during learning.
Area in square feet
Number of rooms
Total carpet area
"Area" and "Carpet Area" may be almost the
same, so including both is redundant.
One customer spent
Unusual data points
10 Outliers ₹10 lakh while others
distort the model.
spent ₹10k.
Using test scores to
Model accidentally
predict student
11 Data Leakage uses future info from
admission before
the dataset.
scores are known.
Having both "DOB"
Duplicate or repeated
12 Redundant Features and "Age" in same
info adds no value.
dataset.
Cats and small dogs in
Features for different
13 Class Overlap image dataset are
classes are too similar.
visually similar.
Raw text, audio, Feeding raw tweets
14 Unstructured Data images require into model without
preprocessing. cleaning.
Including a user’s
Data not related to
15 Irrelevant Features favorite color to
prediction task.
predict loan approval.
16 Sparse Data Most values are zero Product rating matrix
or missing, making where users rate only
Issue
Issue Explanation Example
No.
learning hard. a few items.
Customer behavior
Data changes over
changed after COVID
17 Data Drift time, model becomes
but model uses old
outdated.
data.
Two records of the
Same data appears
18 Duplicate Entries same person create
more than once.
bias.
Label Errors in Machine Learning
What is a "Label"?
In supervised learning, labels are the correct answers (output) for the training data.
For image classification:
Input = Dog image, Label = "Dog"
For email spam detection:
Input = Email text, Label = "Spam" or "Not Spam"
What are Label Errors?
A label error means the wrong output is given for an input during training.
-------------------What is Data Leakage?
Data Leakage happens when the model accidentally uses information during training that it
shouldn’t know — especially future data or test data.
This gives the model an unfair advantage, so it performs very well in training but fails in real-world
use.
Simple Example: Student Admission Prediction
Goal: Predict whether a student will be admitted based on input features.
Feature 1 Feature 2 Feature 3 Label
Grades Attendance ✅ Entrance Exam Score Admitted (Yes/No)
Problem: If the entrance exam hasn’t happened yet (future data), using its score to predict admission
is data leakage.
📌 You're training the model with information it won’t have during real prediction.
BASIC TYPES OF DATA IN MACHINE LEARNING
Qualitative data provides information about the quality of
an object or information which cannot be measured. For
example, if we consider the quality of performance of students
in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the
category of qualitative data. Also, name or roll number of
students are information that cannot be measured using some
scale of measurement. So they would fall under qualitative
data. Qualitative data is also called categorical data.
Qualitative data can be further subdivided into two types as
follows:
1. Nominal data
2. Ordinal data
Nominal data is one which has no numeric value, but a named value. It is used for assigning named
values to attributes.
1. Blood group: A, B, O, AB, etc.
2. Nationality: Indian, American, British, etc.
3. Gender: Male, Female, Other
mathematical operations such as addition,subtraction, multiplication, etc. cannot be performed on
nominal data.
Ordinal data, in addition to possessing the properties of nominal data, can also be naturally ordered.
This means ordinal data also assigns named values to attributes but unlike nominal data, they can be
arranged in a sequence of increasing or decreasing value so that we can say whether a value is
better than or greater than another value. Examples of ordinal data are
1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.
2. Grades: A, B, C, etc.
3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.
Quantitative data relates to information about the quantity of an object – hence it can be measured.
For example, if we consider the attribute ‘marks’, it can be measured using a scale of measurement.
Quantitative data is also termed as numeric data. There are two types of quantitative data:
1. Interval data:
Values are numerical.
You can do addition and subtraction.
BUT there is no true zero — zero does not mean "nothing".
You cannot multiply or divide meaningfully.
You cannot say “twice as much”.
Difference is known, but no true zero
30°C is hotter than 20°C → ✅ (difference is 10°C)
40°C = 20°C + 20°C → ✅ (addition is possible)
But 40°C is not “twice as hot” as 20°C → ❌
2. Ratio data:
Values are numerical.
Has an absolute zero (zero = nothing).
You can add, subtract, multiply, and divide.
You can say “twice as much” or “half”.
📏 Example: Height, Weight, Age, Money
A person weighing 60 kg is heavier than one weighing 30 kg → ✅
60 kg = 30 kg + 30 kg → ✅
60 kg is twice as heavy as 30 kg → ✅
₹0 means no money → ✅ (true zero)
Exploring Structure of Data
What it Means:
It means understanding the format, shape, and contents of a dataset before doing Machine
Learning.
Example:
You have a dataset of students:
Roll No Name Age Gender Marks
101 Akhil 17 Male 78
102 Suji 18 Female 85
What you explore:
How many rows and columns? (→ Shape of the data)
What are the data types? (→ Numeric, Categorical, Text)
Are there missing or duplicate values?
What are the min, max, mean of numeric values like Age or Marks?
2. Data Quality and Remediation
What it Means:
Checking if data is clean and correct. If not, fix it — that’s called remediation.
❌Common Data Quality Issues:
Problem Example Fix (Remediation)
Missing values Age column has empty cell Fill with average (mean), or drop the row
Incorrect format Gender = "F" / "Female" / "f" Standardize to "Female"
Outliers Marks = 999 Remove or correct
Problem Example Fix (Remediation)
Duplicate rows Two identical records Keep only one
Inconsistent categories Subject = "Maths", "math", "MATH" Convert all to "Math"
🧾 Example:
plaintext
CopyEdit
Name Age Gender Marks
Navalan 17 Male 88
Suji — Female 90
Akhil 200 Male 999
Age missing → Fill with average (e.g., 18)
Age = 200 → Invalid, fix or remove
Marks = 999 → Outlier, remove
3. Data Pre-processing
📌 What it Means:
Making the data ready for training a model.
Steps in Data Pre-processing:
Step What it Does Example
1. Handle Missing Values Fill or remove missing data Age = null → fill with mean
2. Encoding Categorical Data Convert text to numbers Gender: Male = 0, Female = 1
3. Feature Scaling Normalize values to same scale Marks from 0–100 → 0–1
4. Remove Duplicates Avoid repeated data Same student row twice
5. Data Splitting Train/test split 80% data for training, 20% for testing
Name Age Gender Marks
Navalan 17 Male 88
Suji null Female 90
Akhil 200 Male 999
Output
Name Age Gender(M/F → 1/0) Marks (scaled 0–1)
Navalan 17 0 0.88
Suji 18 1 0.90
Data Pre-processing in ML
Before training any model, data must be processed properly.
🔄 Steps in Preprocessing:
1. Handling Missing Data:
o Fill in missing values using:
Mean, Median, or Most Frequent value.
2. Encoding Categorical Variables:
o Convert text to numbers.
o Two popular methods:
Label Encoding (e.g., Yes → 1, No → 0)
One-Hot Encoding (creates binary columns)
3. Feature Scaling:
o Makes features have similar ranges.
o Two main types:
Normalization (Min-Max): scales between 0 and 1.
Standardization (Z-score): mean = 0, std dev = 1.
4. Splitting Data:
o Split dataset into:
Training Set: to train the model.
Test Set: to evaluate performance.
Unit-2
Modelling and Evaluation
This structured representation of raw input data to the meaningful pattern is called a model
A machine learning algorithm creates its cognitive capability by building a mathematical formulation
or function, known as target function, based on the features in the input data set.Just like a child
learning things for the first time needs her parents guidance to decide whether she is right or wrong,
in machine learning someone has to provide some non-learnable parameters, also called hyper-
parameters. Without these human inputs, machine learning algorithms cannot be successful
A cost function (also called error function) helps to measure the extent to which the model is going
wrong in estimating the relationship between X and Y. In that sense, cost function can tell how bad
the model is performing. For example, R-squared (to be discussed later in this chapter) is a cost
function of regression model.
Loss function is almost synonymous to cost function – only difference being loss function is usually a
function defined on a data point, while cost function is for the entire training data set. Machine
learning is an optimization problem. We try to define a model and tune the parameters to find the
most suitable solution to a problem. However, we need to have a way to evaluate the quality or
optimality of a solution. This is done using
objective function. Objective means goal. Objective function takes in data and model (along with
parameters) as input and returns a value. Target is to find values of model parameter to maximize or
minimize the return value. When the objective is to minimize the value, it becomes synonymous to
cost function. Examples:
1. Introduction
When we start a Machine Learning project, the very first step is to understand:
What problem we want to solve
What data we have
Which type of ML fits the problem
Example:
Suppose we want to predict house prices in Chennai.
Problem: Estimate the price of a house based on size, location, and number of rooms.
Data: A dataset with past house sales (features: area, bedrooms, location; target: price).
ML type: This is a Supervised Learning (Regression) problem.
2. Selecting a Model
Here, we choose the algorithm (model) that is most suitable for our data and problem type.
Factors affecting choice:
1. Size of dataset
2. Type of output (numeric, category, yes/no)
3. Accuracy vs speed requirement
4. Interpretability (how easily we can explain it)
Example:
For predicting house prices:
Option 1: Linear Regression → simple, interpretable.
Option 2: Decision Tree → works with nonlinear relationships.
Option 3: Random Forest → more accurate but harder to interpret.
If interpretability is important, choose Linear Regression.
If accuracy matters more, choose Random Forest.
Explaination
Option 1: Linear Regression → simple, interpretable
What it does: Fits a straight-line (or plane in multiple dimensions) relationship between
inputs and output.
When to use: When the relationship between features and target is roughly linear.
Pros:
o Very easy to understand and explain.
o Fast to train.
o Works well if assumptions are met.
Cons:
o Struggles with complex, non-linear relationships.
o Sensitive to outliers.
House Price Example:
Price = 5000 × Area + 2,00,000 × Bedrooms + ...
Linear Regression can only handle numerical data directly.
If you have categorical data (like "City = Chennai/Bangalore"), you must first convert it into numbers
using encoding methods like One-Hot Encoding or Label Encoding before training.
Option 2: Decision Tree → works with nonlinear relationships
What it does: Splits data into decision rules (“if…then…” style) to make predictions.
When to use: When relationships between inputs and outputs are non-linear.
Pros:
o Handles both numerical and categorical data.
o Can capture complex patterns.
Cons:
o Can overfit if not pruned.
o Predictions may vary a lot for small changes in data.
House Price Example:
sql
CopyEdit
IF Area > 1500 sq.ft → Price > ₹60 lakh
ELSE IF Bedrooms >= 3 → Price = ₹55 lakh
ELSE → Price = ₹45 lakh
Option 3: Random Forest → more accurate but harder to interpret
What it does: Builds many decision trees and combines (averages) their results for
prediction.
When to use: When you want high accuracy and have enough data/computation power.
Pros:
o Reduces overfitting compared to a single tree.
o Handles complex non-linear relationships.
Cons:
o Not as easy to explain (hundreds of trees).
o Slower to train and predict than Linear Regression.
House Price Example:
o Tree 1 predicts ₹55 lakh
o Tree 2 predicts ₹52 lakh
o Tree 3 predicts ₹54 lakh
o Final price = average (₹53.67 lakh)
Which to choose?
If interpretability matters → Linear Regression (easy to explain to clients/stakeholders).
If data is complex and accuracy is top priority → Random Forest.
If you want a balance of complexity and interpretability → Decision Tree.
3. Training a Model (Supervised Learning)
Training means teaching the model from historical data so it can learn the relationship between
input features (X) and output labels (Y).
Steps:
1. Split data into:
o Training set (e.g., 80%)
o Test set (e.g., 20%)
2. Feed training data into the algorithm.
3. The model adjusts its internal parameters to reduce error.
Training a Model in Supervised Learning – Methods with Examples
1. Holdout Method
Concept: Split dataset into training and test sets (commonly 70% train, 30% test).
Example:
Suppose we have 1000 labeled emails (spam / not spam).
Split: 700 emails for training, 300 emails for testing.
Train model (e.g., Decision Tree) on 700 → Predict labels of 300.
Compare predicted vs actual labels → get accuracy (say 92%).
Simple, fast.
Problem: Some classes (e.g., very few spam emails) may not appear in test set properly.
Holdout Method
Data is split into Training (70–80%) and Test (20–30%).
Training set → used to build model.
Test set → used to evaluate performance (accuracy, error rate, etc.).
Random sampling used for splitting.
Problem: Class imbalance (some classes under-represented in test/train sets).
Solution: Stratified random sampling → ensures equal class proportions.
Sometimes 3 sets used:
o Training → build model
o Validation → tune/refine model
o Test → final evaluation
2. K-Fold Cross-Validation
📌 Concept: Split data into k equal folds, rotate test fold each time.
Example (10-Fold CV):
Dataset = 1000 patient records (disease present / absent).
Split into 10 folds (each = 100 records).
Iteration 1 → Fold 1 = test, folds 2–10 = train.
Iteration 2 → Fold 2 = test, folds 1,3–10 = train.
… Repeat until all folds tested.
Average accuracy across 10 runs = final performance (say 88%).
Special case: LOOCV (Leave-One-Out CV):
Dataset = 1000 records → train on 999, test on 1.
Repeat 1000 times.
Very accurate but very slow.
3. Bootstrap Sampling
📌 Concept: Sampling with replacement → training set may contain duplicate records. Remaining
≈36% becomes test set.
What is it?
A method to create training and test datasets using random sampling with replacement.
“With replacement” means:
o After picking one record, we put it back into the dataset, so it can be chosen again.
o That’s why the same record may appear multiple times in training.
o Some records may not appear at all → those become test set.
🔹 Why do we need it?
Useful when dataset is small (not enough data for normal holdout or k-fold).
By resampling many times, we can generate multiple training sets and get a better idea of
model performance.
🔹 Example
Suppose dataset has 5 students:
D = {S1, S2, S3, S4, S5}
We want a bootstrap sample of size 5 (same as original size).
👉 Randomly select with replacement:
1st pick → S3
2nd pick → S5
3rd pick → S3 (again, because replacement allows repeats)
4th pick → S1
5th pick → S2
Training set = {S3, S5, S3, S1, S2}
(Notice S3 is repeated, S4 missing)
Test set = {S4}
(Anything not picked goes to test set)
4. Lazy vs. Eager Learners
Eager Learner Example
Suppose we use Decision Tree to classify fruits.
Training data: [Apple = red+round, Banana = yellow+long, Orange = orange+round].
The tree learns general rules:
o If color=yellow → Banana
o If color=red & round → Apple
o If color=orange & round → Orange
✅ Model is ready → Quick prediction.
❌ Training takes longer.
Lazy Learner Example (KNN)
Training data: same fruit dataset.
New test fruit: round, red.
KNN directly compares with training set (doesn’t build rules).
Finds nearest neighbors → predicts “Apple”.
✅ Training is instant.
❌ Prediction is slow (must compare each time).
MODEL REPRESENTATION AND INTERPRETABILITY
1. Bias
When we implementing statistical model /model ml we are facing 2 type of
error is bias and variance. Bias ,variance are cause, underfit,overfit are
outcomes
Error because the model is too simple and fails to capture important patterns.
1. Imagine you are recording a song.
The real music has drums, guitar, and vocals (complex pattern).
But you use a very cheap microphone that only records in mono and cuts off high/low
frequencies.
Result: The recording captures only a basic flat version of the song → many details
(pattern) are missed.
2. Graph/Math Example
Suppose true data is a curve (quadratic).
You try to fit a straight line.
Model misses the curve → poor performance on both training and test.
Why it matters?
o Model cannot learn the real trend.
o Errors stay high even if we give more data.
o Leads to underfitting.
2.Variance Error because the model is too complex, fitting even random
noise & performs badly on new data.
When it happens?
o Model is too complex.
o Too many parameters/features.
o Not enough regularization.
o Small training dataset (model memorizes instead of
generalizing).
Real-life (Music Example)
Using a super-sensitive mic that records every cough, fan noise,
or chair movement.
You get the song plus noise.
Playback is messy, doesn’t sound good outside studio.
Works well on training but fails badly on test.
Effect: Model captures noise instead of general trend.
3.A Underfitting In Machine Learning
🔹 Definition:
Underfitting happens when a model is too simple to capture the hidden
patterns of the data.
It fails both on:
Training data ❌ (cannot learn enough), and
Test data ❌ (cannot generalize).
🔹 Causes of Underfitting
1. Too simple model → e.g., Linear regression for non-linear curved data.
2. Too few features → Important variables not included.
3. Insufficient training time → Model not trained properly.
4. Too much regularization → Restrictions stop the model from learning.
5. Not enough training data → Model doesn’t see enough examples to
learn.
🔹 Solution to Underfitting
✅ Use more complex models (e.g., polynomial regression, decision trees).
✅ Add more features.
✅ Reduce regularization.
✅ Train for longer time.
✅ Use more training data (if available).
4. A Overfitting
🔹 Definition
Overfitting happens when a model learns the training data too well — including noise and
outliers.
It performs excellent on training data ✅
But poor on unseen/test data ❌
🔹 Cause (Why it happens?)
1. Model is too complex (too many parameters/features).
2. Small training dataset → model memorizes instead of generalizing.
3. Too many training iterations (learning for too long).
4. No regularization (no penalty for complexity).
🔹 Solution (How to avoid?)
1. Use cross-validation (k-fold).
2. Keep aside a validation set to monitor performance.
3. Simplify the model (reduce parameters, prune decision trees).
4. Apply regularization (L1, L2, Dropout in neural networks).
5. Collect more training data.
🔹 Exam Answer (Short form)
Overfitting occurs when the model fits the training data too closely, including noise and
outliers.
Causes: complex model, small data, no regularization.
Effect: high training accuracy, low test accuracy (poor generalization).
Avoided using: cross-validation, validation set, regularization, pruning, more data.
EVALUATING PERFORMANCE OF A MODEL
Supervised Learning – Classification (Theory +
Example)
1. Classification in Supervised Learning
Goal: Assign a class label (Win/Loss, Malignant/Benign,
Spam/Not Spam, etc.)
Uses predictor features → (e.g., toss result, no. of spinners,
past wins).
Model is evaluated by comparing predictions with actual
outcomes.
Confusion Matrix A confusion matrix is a table that shows how well a
classification model performs.
It compares the actual labels (ground truth) with the predicted labels (model
output).
Mainly used for classification problems (binary or multi-class).
Imagine a Cricket Prediction Model
We built a model to predict whether India will Win or Lose a match.
👉 Sometimes it predicts correct ✅
👉 Sometimes it predicts wrong ❌
We need a way to check how good this model is → that’s where
Confusion Matrix comes.
2. Confusion Matrix – Like a Scoreboard
Predicted Win Predicted Loss
✅ TP (True Positive) → model ❌ FN (False Negative) →
Actual
said Win & India actually model said Loss but India
Win
Won actually Won
❌ FP (False Positive) → ✅ TN (True Negative) →
Actual
model said Win but India model said Loss & India
Loss
actually Lost actually Lost
👉 So:
TP = Correct Win prediction
TN = Correct Loss prediction
FP = Wrongly said Win
FN = Wrongly said Loss
3. Key Performance Metrics
(a) Accuracy
Proportion of all correct predictions.
Good for balanced datasets, but misleading when data is
imbalanced.
Example: Predicting disease where 99% people are healthy →
model says “healthy” for all → 99% accuracy but useless.
(f) Recall (same as Sensitivity)
Already defined.
(g) Kappa Statistic (κ)
Adjusted accuracy → removes “chance agreement”.
κ = 1 → Perfect, κ = 0 → Random.
Example Numbers (100 Matches)
India actually won 87 times.
India actually lost 13 times.
Model prediction results:
TP = 85 (correct Win predicted)
FN = 2 (missed Win)
FP = 4 (wrongly said Win)
TN = 9 (correct Loss predicted)
4. Now Formulas (Think Simple)
Accuracy = How often model is right
(TP+TN)/Total(TP + TN) / Total(TP+TN)/Total =
(85+9)/100 = 94%
Error Rate = How often model is wrong
(FP+FN)/Total(FP + FN) / Total(FP+FN)/Total =
(4+2)/100 = 6%
Sensitivity / Recall = Out of all actual Wins, how
many did we catch?
(TP/(TP+FN))(TP / (TP+FN))(TP/(TP+FN)) = 85/87 =
97.7%
Specificity = Out of all actual Losses, how many did
we catch?
(TN/(TN+FP))(TN / (TN+FP))(TN/(TN+FP)) = 9/13 =
69.2%
Precision = Out of predicted Wins, how many were
correct?
(TP/(TP+FP))(TP / (TP+FP))(TP/(TP+FP)) = 85/89 =
95.5%
5. When to Use Which Metric?
Medical case (Tumor detection) → Focus on
Sensitivity (don’t miss a patient with tumor).
Spam filtering → Focus on Precision (don’t
wrongly block important emails).
General cases → Accuracy, Precision, Recall all
matter.
✅ Memory Shortcuts:
Recall = "Did we catch all the real Wins?"
Precision = "Are predicted Wins mostly correct?"
Specificity = "Did we catch the real Losses
correctly?"
The Trade-Off: Precision vs. Recall (The Most Important
Concept)
You can't always have perfect both. Improving one
often hurts the other.
High Precision, Lower Recall: A conservative
model. It only predicts "Positive" when
it's extremely sure.
o Spam Example: If you set the filter to be very
strict (high Precision), you'll have very few
non-spam emails in Spam (low FP), but you'll
also let more spam slip into your inbox (high
FN → low Recall).
High Recall, Lower Precision: An aggressive
model. It tries to catch all positives, even if it
means making some mistakes.
o Spam Example: If you set the filter to
catch every single spam (high Recall), it will
also be quick to mark suspicious emails as
spam, risking that some important emails get
caught (high FP → low Precision).
So we cannot use confusion matrix (that’s only for
classification).
👉 Instead, we check how far predicted values are from
actual values.
2. Key Metrics in Regression
3. Example (House Price Prediction)
Suppose actual vs predicted house prices (in ₹ lakhs):
House Actual (y) Predicted (ŷ)
1 50 48
2 60 65
3 40 42
4 55 50
Step 1: Errors
Error = Actual – Predicted
House1 → 50 – 48 = 2
House2 → 60 – 65 = –5
House3 → 40 – 42 = –2
House4 → 55 – 50 = 5
Step 2: Metrics
MAE = (|2|+|–5|+|–2|+|5|)/4 = (14)/4 = 3.5
MSE = (2² + (–5)² + (–2)² + 5²)/4 = (4+25+4+25)/4 =
58/4 = 14.5
RMSE = √14.5 ≈ 3.8
R² → tells how much variance is explained. If R² =
0.90 → 90% accuracy in explaining variation.
4. When to Use Which?
MAE → Simple, average error size.
MSE / RMSE → When large errors must be heavily
penalized.
R² → For overall model quality (goodness of fit).
✅ Shortcut memory:
MAE = Mean of |errors|: Average of the absolute
difference between Actual value and predicted
value
MSE = Mean of (errors²): Average of the square
difference b/w Actual and predicted value
RMSE = Square root of MSE
R² = How much variation explained by model
Understand purpose what MAE means: they tell us how far predictions are from
actual values.
MAE = average mistake size
Example: MAE = 50 → on average, your prediction is off by 50 units.
🔹 Step 2: Why the target scale matters
Imagine two cases:
1. House Price (₹10,00,000)
o Error = 50
o Relative error = 50/10,00,000×100=0.005% → excellent.
o Means your predictions are almost perfect.
2. Product Price (₹300)
o Error = 50
o Relative error = 50/300×100=16.7%→ poor.
o Losing ~17% accuracy is not acceptable.
👉 That’s why we don’t just look at “50”; we look at “50 compared to what”
Feature engineering
Feature
A feature = attribute (column) of a dataset used in
ML.
Features = dataset dimensions.
Example: Iris dataset has 5 features: Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, Species.
→ Predictor features = first 4
→ Class feature = Species
🔹 Feature Engineering
Process of creating, selecting, or transforming
features to improve model performance.
Two main parts:
1. Feature Transformation: It means changing
existing features into a new form so that the
model can understand and learn better.
--It does NOT create new information, but
represents data in a better way.
Feature transformation transforms the data –
structured or unstructured, into a new set of features
Two types:
A. Feature Construction = create new
features from old ones (adds
dimensions).
Example: Apartment length + breadth
→ construct new feature Area =
length × breadth.
B. Feature Extraction is the process of
extracting or creating a new set of features from
the original set of features using some
functional mapping.
Feature Selection (Subset Selection)
No new feature is made.
Just pick the most important features
from all.
Example: From 100 features, maybe only
20 matter → keep those.