Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views48 pages

ML For Engineers

Human learning is a cognitive process that involves gaining information through observation and experience, essential for decision-making and efficiency. It occurs through expert guidance, application of past knowledge, and self-learning. Machine learning, while inspired by human learning, involves data input, abstraction, and generalization to improve performance on tasks using past data.

Uploaded by

myr4112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views48 pages

ML For Engineers

Human learning is a cognitive process that involves gaining information through observation and experience, essential for decision-making and efficiency. It occurs through expert guidance, application of past knowledge, and self-learning. Machine learning, while inspired by human learning, involves data input, abstraction, and generalization to improve performance on tasks using past data.

Uploaded by

myr4112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Topic: What is Human Learning?

Sub-topic: Definition, Need, and Types


Presented by: Navalan M
Course: B.Tech / Ph.D Coursework
Date: [Insert Date]

Slide 1: What is Human Learning?

 In cognitive science, learning is the process of gaining information through observation.

 It helps us:

o Walk safely

o Do homework correctly

o Launch rockets with accurate angles

📌 Learning improves performance over time

Slide 2: Why Do We Need to Learn?

 Daily life requires decision-making and actions

 To do a task properly, we need:

o Prior information

o Experience

 📈 More learning = Higher efficiency

🧠 Example:

 More math practice → fewer mistakes

 More rocket launches → better safety & precision

Slide 3: Types of Human Learning

Human learning happens in 3 ways:

Type Description

1. Expert Guidance Direct teaching from others

2. Guided by Past Knowledge Apply what was learned before

3. Self-Learning Learn by doing, trial & error

Slide 4: 1. Learning Under Expert Guidance


👶 Example: Baby learning words and colors from parents
🏫 School: Learning alphabets, math, grammar
🎓 College: Engineering, medicine, law
👨‍💼 Job: On-the-job training from seniors

📌 Mentors or teachers help because they have experience and subject knowledge

Slide 5: 2. Learning Guided by Past Knowledge

🧠 Use past knowledge to make new decisions


📌 Examples:

 Baby groups objects by color without being told

 Student identifies verbs from a sentence

 Professional targets right customers based on old campaign data

⚠️No direct teaching involved, but learning is influenced by past expert input

Slide 6: 3. Self-Learning (Learning by Self)

 Learn through experience, mistakes, and practice

 Examples:

o Baby learning to walk through obstacles

o Child learning to ride a cycle

o Adult learning to drive a car

📌 We form personal checklists based on success/failure

Slide 7: Summary

✔️Human learning is essential for growth, efficiency, and problem-solving


✔️Learning can be:

 Direct (from experts)

 Indirect (from past lessons)

 Experiential (from our own actions)

Slide 8: Questions & Discussion

💬 Let’s discuss your own real-life examples of the 3 types of learning.

Learning by Self (Trial and Error)

Example (Your Life):


The first time you used new software (like SPSS or Python for data analysis), you learned by clicking
buttons, making mistakes, and gradually understanding how to run tests. No one taught you every
step.

Student-friendly example:

A student learns how to ride a bicycle. They fall a few times, but finally succeed by trying again and
again.

What is Machine Learning? — Explained Simply

❓ Do Machines Really Learn?

Yes, machines can learn, but not the way humans do. In machine learning (ML), "learning" means:

A machine improves its performance on a task using past data (experience).

This idea is formalized in a famous definition by Tom M. Mitchell (Carnegie Mellon University):

🧠 “A computer program is said to learn from experience E, with respect to some task T and
performance measure P, if its performance on T, as measured by P, improves with experience E.”

🔄 Understanding the Definition with Examples

✅ Example 1: Playing Checkers

 E (Experience): Playing many games in the past

 T (Task): Playing checkers

 P (Performance): % of games won

If the machine wins more games over time, it means it is learning.

✅ Example 2: Image Classification

 E: Labeled images (e.g., cats, dogs, birds)

 T: Predicting the correct label for a new image

 P: Accuracy – how many predictions are correct

More correct classifications over time = better learning.

🧠 How Do Machines Learn?

Machine learning follows three major steps:

1. Data Input

o Machine receives historical data (e.g., exam scores, photos, sensor readings)
2. Abstraction

o It extracts patterns or "knowledge" from data

o Just like students summarize notes to focus on key points

3. Generalization

o It applies what it learned to new unseen data

o Like a student solving a new question based on concepts, not memorization

🤝 Analogy with Human Learning

Let’s compare to human learning:

Human Learning Machine Learning

Reads textbook Inputs data

Highlights key points Extracts features/patterns

Solves new problems in exam Predicts on new data

📘 Example: Animal Grouping (Human to ML Analogy)

Humans don’t memorize all animals. Instead, we group animals:

 Mammals: Warm-blooded, have fur

 Birds: Lay eggs, have feathers

 Fishes: Live in water, lay eggs


→ So we generalize:
"If an animal has feathers and lays eggs, it's likely a bird."

Abstraction in Machine Learning

🧠 What is Abstraction?

 When we feed input data to a machine, it is in raw form — like exam marks, weather
reports, or bank transactions.

 This raw data cannot be used directly. It must be converted into a pattern or model — this
process is abstraction.

 The model is the machine’s knowledge created from data.

🔧 Model = Summarized Knowledge

This model could be:

 If/else rules (like: if salary > ₹50,000, mark as high-income)

 Mathematical equations (like: y = c₀ + c₁x)


 Trees/graphs (used in decision trees, social networks)

 Groupings (like clustering people based on age/income)

👨‍💻 Who chooses the model?

Humans do. The choice depends on:

 Type of problem
(e.g., prediction, classification, trend analysis)

 Input data nature


(e.g., numeric/categorical, missing values, size)

 Domain importance
(e.g., fraud detection needs fast, accurate results)

🧪 Example: Fitting a Model

Suppose our model is:

y = c₀ + c₁x (Linear Regression)

We must find the best values for c₀ and c₁ from training data.
This is called training the model.

Once values are found → the model can now predict outcomes for new x values.

📘 2. Generalization

🤔 What is Generalization?

After training, the model is tested on new, unseen data.


If it performs well → it is said to generalize well.

Training = Learn from past


Generalization = Perform on future

❌ What Can Go Wrong?

1. Overfitting – model is too perfect on training data, fails on test data.


(like a student who memorizes the guide but can’t solve new questions)

2. Test data is too different from training data.


The model struggles, just like a human in a totally new situation.

🧠 Generalization is Like Human Gut-Feeling

Sometimes, even humans make decisions based on intuition or past experience.

E.g., A doctor identifies a rare disease based on symptoms they've only seen once.
Similarly, ML models may not always give perfect answers — they make educated guesses.

📘 3. Well-Posed Learning Problem (Tom Mitchell’s View)

To check if a problem is right for machine learning, ask:

✅ Step 1: What is the Problem?

Example:

I want a program to predict the next word I type.

Use Tom Mitchell's format:

 T (Task): Predict next word

 E (Experience): Past data (English text corpus)

 P (Performance): Prediction accuracy

Also write:

 Assumptions: Language is English, words follow grammar

 Similar Problems: Next-song suggestion, product recommendation

✅ Step 2: Why Solve It?

 Motivation: Improves typing speed, assists in writing

 Solution Benefits: Saves time, improves productivity

 Use Case: Used in mobile keyboards (like Gboard, SwiftKey)

✅ Step 3: How to Solve It?

Describe steps like:

1. Collect a dataset of typed sentences

2. Clean and prepare the data

3. Train a model using NLP techniques

4. Use model to predict next word

5. Test on unseen text

🧾 Summary Chart
Term Meaning

Abstraction Convert raw data into a usable pattern (model)

Model Summarized form of data (equation, tree, rules, etc.)

Training Fitting the model to past (training) data

Generalization Using model on new data and still making correct predictions

Overfitting Model memorizes too much, can't work on new data

Well-posed Problem Task clearly defined by Task (T), Experience (E), and Performance (P)

This process has 3 parts:

✅ 1. Data Input (Give past examples to the computer)

We give old data to the computer.

🧒 Example: You tell the computer:

Fruit Color Shape Is Apple?

Red Round Yes

Green Round Yes

Yellow Long No
Fruit Color Shape Is Apple?

This is called training data. The computer will learn from this.

🔁 2. Abstraction (Computer finds patterns inside)

Now the computer thinks like this:

“Oh! Apples are usually red or green and round.


Bananas are yellow and long.”

So the computer builds a rule in its brain.

It does not just remember your table — it finds a general rule.

📈 3. Generalization (Use that rule on new data)

Now you give new fruit info:

🔸 Color: Red, Shape: Round

Computer says ➡️“Yes, this is probably an apple!”

This is called Generalization:

Using what it learned to make a smart guess about new things.

🍎 A Real-Life Example (Fruits)

You are the “machine” in this case:

 You see many fruits.

 You observe their color, shape.

 You learn: Red & round → usually apple.

 Next time someone gives you a red round fruit, you say: “This is probably an apple.”

✅ Types of Machine Learning

Machine Learning is mainly divided into 3 types:

1. Supervised Learning
2. Unsupervised Learning

3. Reinforcement Learning

Let’s go one by one with real-world examples.

Supervised learning – Also called predictive learning. Model can learn from labelled data.

Unsupervised learning – Also called descriptive learning

Reinforcement learning :model learn through trial and error by interacting witn an environmental
and receiving reward or penalty based on its action

🟦 1. Supervised Learning

📌 Meaning:
In this method, the computer is given both the input and the correct output during training. It learns
by comparing its answer to the correct answer.

🧠 Think of it like a teacher supervising a student.

Predicting whether a tumour is malignant or benign

Predicting the price of domains like real estate, stocks, etc.

Are these two problems same in nature? The answer is ‘no’. Though both of them are prediction
problems, in one case we are trying to predict which category or class an unknown data belongs to
whereas in the other case we are trying to predict an absolute value and not a class. When we are
trying to predict a categorical or nominal variable, the problem is known as a classification problem.
Whereas when we are trying to predict a real-valued variable, the problem falls under the category
of regression.

test data is used to evaluate how the model performed based on unseen data

the machine should put an image of unknown category, also called a test data

Training data

depends on the information it gets from the past data, which we have called as training data,
modification can be done

classification problems include:

Image classification

Prediction of disease

Win–loss prediction of games

Prediction of natural calamity like earthquake, flood, etc.

Recognition of handwriting

Typical applications of regression can be seen in  y=ax+b , x is predicited variable , y is target


variable

Demand forecasting in retails

Sales prediction for managers

Price prediction in real estate

Weather forecast

Skill demand forecast in job market

2. Unsupervised Learning. Unsl referred as pattern discovery or knowledge discovery.

Meaning:
Here, only input data is given – no answers. The computer tries to find patterns or groupings on its
own.

Think of it like a student learning without a teacher.

A. Clustering . group or organize similar objects together. One of the most commonly adopted
similarity measure is distance.

Customer A buys: Milk, Bread, Butter

Customer B buys: Milk, Bread, Butter

Customer C buys: Soap, Shampoo

Customer A and B shopping style is similar → distance small → same cluster.


Customer C is different → distance large → different cluster.

 Age difference
 Shopping item difference

 Color or shape difference

When data values are very similar, the “distance” is low.


When they are very different, the “distance” is high.

B association analysis. As a part of association analysis, the association between data elements is
identified

Association means: “if this happens, that also happens”


→ It finds patterns like:
“If item A is present, item B is also likely to be present.”

If many people buy bread, they also buy butter.


→ This is an association.

Example 1: Market Segmentation

 A shopping website collects data like:

o Age, spending habits, items bought

 It finds that there are 3 customer groups:

o Young buyers

o Senior buyers

o Bulk buyers

 In a supermarket:

o “If a person buys milk and cereal, they also likely buy sugar.”
 The shop can then:

o Place those items near each other

o Give combo offers (boosts sales)

🔹 Applications:

 Online shopping recommendations (e.g., Amazon: “People who bought this also
bought...”)

 Product placement strategies in stores

📌 Simple Comparison:

Feature Clustering Association Analysis

Goal Group similar items Find items that occur together

Input Data Unlabeled Unlabeled

Output Groups (Clusters) Rules (like A → B)

Real-life Example Customer segmentation Market basket, product recommendation

3. Reinforcement Learning
Meaning:
The computer learns by doing. It takes actions, and gets rewards or punishments (like
points).

Like training a dog with rewards.


💡 Example 1: Playing a Game (Chess, Mario, etc.)

 The computer tries a move.

 If it wins → reward.

 If it loses → penalty.

 Over time, it learns the best way to play.

Example 2: Self-driving Car

 The car gets a reward for safe driving.

 Gets penalty for accidents or going off track.

Used for:

 Robotics

 Game playing (AlphaGo)

 Self-driving cars

Summary Table:
Output Data
Type Input Data Goal Example
Given?

Predict correct
Supervised Learning Yes Yes Email spam detection
answer

Unsupervised Find hidden


Yes No Customer grouping
Learning patterns

Reinforcement Through Learn by trial & Playing games, robot


Reward/Penalty
Learning actions error navigation

problems That Should NOT Be Solved Using Machine Learning


A. Simple Rule-Based Logic

 Example: Calculating area of a circle (Use formula, not ML)

 Converting Celsius to Fahrenheit ((C × 9/5) + 32)


 Example 3: Checking if a number is even or odd

These are deterministic — no need to "learn" them using data.

B. Small Datasets

 ML needs data to learn patterns. Very small datasets don’t justify using ML.

 Example 1: Trying to predict student marks using data of only 4 students

 Example 2: Predicting disease from only 5 patient records

C. High-Risk Situations with Uncertainty

Reason: Machine Learning is often a black box. In critical areas, we need 100% trust and
explanation.

 Example 1: Deciding cancer treatment without human doctor review

 Example 2: Controlling nuclear reactors or air traffic

 Example 3: Deciding court judgments or legal punishments

In such cases, ML can assist, but should not decide.

D. When Data Doesn’t Change Over Time

 If conditions are always the same, rule-based solutions are better.

When Outcome is Already 100% Known and Fixed

Reason: No uncertainty or variation in output.

 Example 1: Traffic light rules (Red = stop, Green = go)

Tasks Requiring Common Sense / Moral Judgment

Reason: ML lacks emotions and ethical thinking.

 Example 1: Deciding who to save in a car accident

 Example 2: Choosing a fair punishment in a family dispute

Needs human judgment, not just pattern recognition.

Issues in Machine Learning While Preparing to Model


Issue No. Issue Explanation Example

Data has blank/null Age is missing for


1 Missing Values entries, which 20% of survey
confuses the model. respondents.
Issue No. Issue Explanation Example

One class 95% non-fraud, 5%


dominates, model fraud – model may
2 Imbalanced Data
ignores minority predict “non-fraud”
class. always.

Real-Life Example: Fraud


Detection

Label (Fraud
Transaction
or Not)

₹1000 Not Fraud

₹2500 Not Fraud

₹980 Not Fraud

₹8000 Fraud

₹300 Not Fraud

₹450 Not Fraud

👉 Out of 1000 transactions:

 950 are Not Fraud

 50 are Fraud

Now, if a model always


predicts "Not Fraud" for
every transaction:

🤖 What the model sees:

 950 correct
predictions (Not
Fraud)

 50 wrong
predictions (missed
all Fraud)

Model memorizes 100% accuracy on


3 Overfitting training data but training data but only
can’t generalize. 60% on test data.

4 Underfitting Model is too simple Predicting stock


to capture trends. prices using only day
Issue No. Issue Explanation Example

of week.

Feature Selection

Why is it Important?

 If you give the model


too many features,
especially irrelevant
ones, it gets
confused.

 This leads to lower


accuracy and longer
training time. Irrelevant or too
many features Including email ID in
Why is "Email ID" bad? reduce accuracy. spam prediction adds
noise.
 It is unique for every
person.

 It has no relation to
spam or not spam.

 The model may start


to memorize IDs, not
learn patterns.

 If a new email ID
comes, the model
fails.

Height in cm vs
Features with
weight in kg — scale
6 Data Scaling different units
difference confuses
affect performance.
model.

Unwanted or Text with spelling


7 Noise in Data incorrect data mistakes or repeated
disturbs learning. values.

Incorrect labels
"Dog" image labeled
8 Label Errors reduce model
as "Cat".
accuracy.

🔄 Additional Issues (with Examples)


Issue
Issue Explanation Example
No.

Multicollinearity

Multicollinearity happens when two or


more input features (columns) are highly
correlated — meaning, they carry the Two or more features Height and leg length –
same information. are highly both increase
correlated. together.
The model receives duplicate signals,
9 which causes confusion during learning.

 Area in square feet

 Number of rooms

 Total carpet area

"Area" and "Carpet Area" may be almost the


same, so including both is redundant.

One customer spent


Unusual data points
10 Outliers ₹10 lakh while others
distort the model.
spent ₹10k.

Using test scores to


Model accidentally
predict student
11 Data Leakage uses future info from
admission before
the dataset.
scores are known.

Having both "DOB"


Duplicate or repeated
12 Redundant Features and "Age" in same
info adds no value.
dataset.

Cats and small dogs in


Features for different
13 Class Overlap image dataset are
classes are too similar.
visually similar.

Raw text, audio, Feeding raw tweets


14 Unstructured Data images require into model without
preprocessing. cleaning.

Including a user’s
Data not related to
15 Irrelevant Features favorite color to
prediction task.
predict loan approval.

16 Sparse Data Most values are zero Product rating matrix


or missing, making where users rate only
Issue
Issue Explanation Example
No.

learning hard. a few items.

Customer behavior
Data changes over
changed after COVID
17 Data Drift time, model becomes
but model uses old
outdated.
data.

Two records of the


Same data appears
18 Duplicate Entries same person create
more than once.
bias.

Label Errors in Machine Learning

What is a "Label"?

In supervised learning, labels are the correct answers (output) for the training data.

 For image classification:


Input = Dog image, Label = "Dog"

 For email spam detection:


Input = Email text, Label = "Spam" or "Not Spam"

What are Label Errors?

A label error means the wrong output is given for an input during training.

-------------------What is Data Leakage?

Data Leakage happens when the model accidentally uses information during training that it
shouldn’t know — especially future data or test data.

This gives the model an unfair advantage, so it performs very well in training but fails in real-world
use.

Simple Example: Student Admission Prediction

Goal: Predict whether a student will be admitted based on input features.

Feature 1 Feature 2 Feature 3 Label

Grades Attendance ✅ Entrance Exam Score Admitted (Yes/No)

Problem: If the entrance exam hasn’t happened yet (future data), using its score to predict admission
is data leakage.

📌 You're training the model with information it won’t have during real prediction.
BASIC TYPES OF DATA IN MACHINE LEARNING
Qualitative data provides information about the quality of

an object or information which cannot be measured. For

example, if we consider the quality of performance of students

in terms of ‘Good’, ‘Average’, and ‘Poor’, it falls under the

category of qualitative data. Also, name or roll number of

students are information that cannot be measured using some

scale of measurement. So they would fall under qualitative

data. Qualitative data is also called categorical data.

Qualitative data can be further subdivided into two types as

follows:

1. Nominal data

2. Ordinal data

Nominal data is one which has no numeric value, but a named value. It is used for assigning named
values to attributes.

1. Blood group: A, B, O, AB, etc.

2. Nationality: Indian, American, British, etc.

3. Gender: Male, Female, Other

mathematical operations such as addition,subtraction, multiplication, etc. cannot be performed on

nominal data.
Ordinal data, in addition to possessing the properties of nominal data, can also be naturally ordered.
This means ordinal data also assigns named values to attributes but unlike nominal data, they can be
arranged in a sequence of increasing or decreasing value so that we can say whether a value is

better than or greater than another value. Examples of ordinal data are

1. Customer satisfaction: ‘Very Happy’, ‘Happy’, ‘Unhappy’, etc.

2. Grades: A, B, C, etc.

3. Hardness of Metal: ‘Very Hard’, ‘Hard’, ‘Soft’, etc.

Quantitative data relates to information about the quantity of an object – hence it can be measured.
For example, if we consider the attribute ‘marks’, it can be measured using a scale of measurement.
Quantitative data is also termed as numeric data. There are two types of quantitative data:

1. Interval data:

Values are numerical.

You can do addition and subtraction.

BUT there is no true zero — zero does not mean "nothing".

You cannot multiply or divide meaningfully.

You cannot say “twice as much”.

Difference is known, but no true zero

 30°C is hotter than 20°C → ✅ (difference is 10°C)

 40°C = 20°C + 20°C → ✅ (addition is possible)

 But 40°C is not “twice as hot” as 20°C → ❌

2. Ratio data:

 Values are numerical.

 Has an absolute zero (zero = nothing).

 You can add, subtract, multiply, and divide.

 You can say “twice as much” or “half”.

📏 Example: Height, Weight, Age, Money

 A person weighing 60 kg is heavier than one weighing 30 kg → ✅

 60 kg = 30 kg + 30 kg → ✅

 60 kg is twice as heavy as 30 kg → ✅

 ₹0 means no money → ✅ (true zero)


Exploring Structure of Data
What it Means:

It means understanding the format, shape, and contents of a dataset before doing Machine
Learning.

Example:

You have a dataset of students:

Roll No Name Age Gender Marks

101 Akhil 17 Male 78

102 Suji 18 Female 85

What you explore:

 How many rows and columns? (→ Shape of the data)

 What are the data types? (→ Numeric, Categorical, Text)

 Are there missing or duplicate values?

 What are the min, max, mean of numeric values like Age or Marks?

2. Data Quality and Remediation


What it Means:

Checking if data is clean and correct. If not, fix it — that’s called remediation.

❌Common Data Quality Issues:

Problem Example Fix (Remediation)

Missing values Age column has empty cell Fill with average (mean), or drop the row

Incorrect format Gender = "F" / "Female" / "f" Standardize to "Female"

Outliers Marks = 999 Remove or correct


Problem Example Fix (Remediation)

Duplicate rows Two identical records Keep only one

Inconsistent categories Subject = "Maths", "math", "MATH" Convert all to "Math"

🧾 Example:

plaintext

CopyEdit

Name Age Gender Marks

Navalan 17 Male 88

Suji — Female 90

Akhil 200 Male 999

 Age missing → Fill with average (e.g., 18)

 Age = 200 → Invalid, fix or remove

 Marks = 999 → Outlier, remove

3. Data Pre-processing
📌 What it Means:

Making the data ready for training a model.

Steps in Data Pre-processing:

Step What it Does Example

1. Handle Missing Values Fill or remove missing data Age = null → fill with mean

2. Encoding Categorical Data Convert text to numbers Gender: Male = 0, Female = 1

3. Feature Scaling Normalize values to same scale Marks from 0–100 → 0–1

4. Remove Duplicates Avoid repeated data Same student row twice

5. Data Splitting Train/test split 80% data for training, 20% for testing

Name Age Gender Marks

Navalan 17 Male 88

Suji null Female 90

Akhil 200 Male 999

Output
Name Age Gender(M/F → 1/0) Marks (scaled 0–1)

Navalan 17 0 0.88

Suji 18 1 0.90

Data Pre-processing in ML
Before training any model, data must be processed properly.

🔄 Steps in Preprocessing:

1. Handling Missing Data:

o Fill in missing values using:

 Mean, Median, or Most Frequent value.

2. Encoding Categorical Variables:

o Convert text to numbers.

o Two popular methods:

 Label Encoding (e.g., Yes → 1, No → 0)

 One-Hot Encoding (creates binary columns)

3. Feature Scaling:

o Makes features have similar ranges.

o Two main types:

 Normalization (Min-Max): scales between 0 and 1.

 Standardization (Z-score): mean = 0, std dev = 1.

4. Splitting Data:

o Split dataset into:

 Training Set: to train the model.

 Test Set: to evaluate performance.


Unit-2
Modelling and Evaluation
This structured representation of raw input data to the meaningful pattern is called a model

A machine learning algorithm creates its cognitive capability by building a mathematical formulation
or function, known as target function, based on the features in the input data set.Just like a child
learning things for the first time needs her parents guidance to decide whether she is right or wrong,
in machine learning someone has to provide some non-learnable parameters, also called hyper-
parameters. Without these human inputs, machine learning algorithms cannot be successful

A cost function (also called error function) helps to measure the extent to which the model is going
wrong in estimating the relationship between X and Y. In that sense, cost function can tell how bad
the model is performing. For example, R-squared (to be discussed later in this chapter) is a cost
function of regression model.
Loss function is almost synonymous to cost function – only difference being loss function is usually a
function defined on a data point, while cost function is for the entire training data set. Machine
learning is an optimization problem. We try to define a model and tune the parameters to find the
most suitable solution to a problem. However, we need to have a way to evaluate the quality or
optimality of a solution. This is done using

objective function. Objective means goal. Objective function takes in data and model (along with
parameters) as input and returns a value. Target is to find values of model parameter to maximize or
minimize the return value. When the objective is to minimize the value, it becomes synonymous to
cost function. Examples:

1. Introduction

When we start a Machine Learning project, the very first step is to understand:

 What problem we want to solve

 What data we have

 Which type of ML fits the problem

Example:

Suppose we want to predict house prices in Chennai.

 Problem: Estimate the price of a house based on size, location, and number of rooms.

 Data: A dataset with past house sales (features: area, bedrooms, location; target: price).

 ML type: This is a Supervised Learning (Regression) problem.

2. Selecting a Model

Here, we choose the algorithm (model) that is most suitable for our data and problem type.

 Factors affecting choice:

1. Size of dataset

2. Type of output (numeric, category, yes/no)

3. Accuracy vs speed requirement

4. Interpretability (how easily we can explain it)

Example:

For predicting house prices:

 Option 1: Linear Regression → simple, interpretable.

 Option 2: Decision Tree → works with nonlinear relationships.


 Option 3: Random Forest → more accurate but harder to interpret.

If interpretability is important, choose Linear Regression.


If accuracy matters more, choose Random Forest.

Explaination
Option 1: Linear Regression → simple, interpretable

 What it does: Fits a straight-line (or plane in multiple dimensions) relationship between
inputs and output.

 When to use: When the relationship between features and target is roughly linear.

 Pros:

o Very easy to understand and explain.

o Fast to train.

o Works well if assumptions are met.

 Cons:

o Struggles with complex, non-linear relationships.

o Sensitive to outliers.

 House Price Example:


Price = 5000 × Area + 2,00,000 × Bedrooms + ...

Linear Regression can only handle numerical data directly.

If you have categorical data (like "City = Chennai/Bangalore"), you must first convert it into numbers
using encoding methods like One-Hot Encoding or Label Encoding before training.

Option 2: Decision Tree → works with nonlinear relationships


 What it does: Splits data into decision rules (“if…then…” style) to make predictions.

 When to use: When relationships between inputs and outputs are non-linear.

 Pros:

o Handles both numerical and categorical data.

o Can capture complex patterns.

 Cons:

o Can overfit if not pruned.

o Predictions may vary a lot for small changes in data.

 House Price Example:

sql

CopyEdit

IF Area > 1500 sq.ft → Price > ₹60 lakh

ELSE IF Bedrooms >= 3 → Price = ₹55 lakh

ELSE → Price = ₹45 lakh


Option 3: Random Forest → more accurate but harder to interpret

 What it does: Builds many decision trees and combines (averages) their results for
prediction.

 When to use: When you want high accuracy and have enough data/computation power.

 Pros:

o Reduces overfitting compared to a single tree.

o Handles complex non-linear relationships.

 Cons:

o Not as easy to explain (hundreds of trees).

o Slower to train and predict than Linear Regression.

 House Price Example:

o Tree 1 predicts ₹55 lakh

o Tree 2 predicts ₹52 lakh

o Tree 3 predicts ₹54 lakh

o Final price = average (₹53.67 lakh)

Which to choose?

 If interpretability matters → Linear Regression (easy to explain to clients/stakeholders).

 If data is complex and accuracy is top priority → Random Forest.

 If you want a balance of complexity and interpretability → Decision Tree.

3. Training a Model (Supervised Learning)

Training means teaching the model from historical data so it can learn the relationship between
input features (X) and output labels (Y).

Steps:

1. Split data into:

o Training set (e.g., 80%)

o Test set (e.g., 20%)

2. Feed training data into the algorithm.

3. The model adjusts its internal parameters to reduce error.

Training a Model in Supervised Learning – Methods with Examples


1. Holdout Method
Concept: Split dataset into training and test sets (commonly 70% train, 30% test).

Example:

 Suppose we have 1000 labeled emails (spam / not spam).

 Split: 700 emails for training, 300 emails for testing.

 Train model (e.g., Decision Tree) on 700 → Predict labels of 300.

 Compare predicted vs actual labels → get accuracy (say 92%).

Simple, fast.
Problem: Some classes (e.g., very few spam emails) may not appear in test set properly.

Holdout Method

 Data is split into Training (70–80%) and Test (20–30%).

 Training set → used to build model.

 Test set → used to evaluate performance (accuracy, error rate, etc.).

 Random sampling used for splitting.

 Problem: Class imbalance (some classes under-represented in test/train sets).

 Solution: Stratified random sampling → ensures equal class proportions.

 Sometimes 3 sets used:

o Training → build model

o Validation → tune/refine model

o Test → final evaluation


2. K-Fold Cross-Validation

📌 Concept: Split data into k equal folds, rotate test fold each time.

Example (10-Fold CV):

 Dataset = 1000 patient records (disease present / absent).

 Split into 10 folds (each = 100 records).

 Iteration 1 → Fold 1 = test, folds 2–10 = train.

 Iteration 2 → Fold 2 = test, folds 1,3–10 = train.

 … Repeat until all folds tested.

 Average accuracy across 10 runs = final performance (say 88%).

Special case: LOOCV (Leave-One-Out CV):

 Dataset = 1000 records → train on 999, test on 1.

 Repeat 1000 times.

 Very accurate but very slow.


3. Bootstrap Sampling

📌 Concept: Sampling with replacement → training set may contain duplicate records. Remaining
≈36% becomes test set.

What is it?

 A method to create training and test datasets using random sampling with replacement.

 “With replacement” means:

o After picking one record, we put it back into the dataset, so it can be chosen again.

o That’s why the same record may appear multiple times in training.

o Some records may not appear at all → those become test set.

🔹 Why do we need it?

 Useful when dataset is small (not enough data for normal holdout or k-fold).

 By resampling many times, we can generate multiple training sets and get a better idea of
model performance.

🔹 Example

Suppose dataset has 5 students:

D = {S1, S2, S3, S4, S5}


We want a bootstrap sample of size 5 (same as original size).

👉 Randomly select with replacement:

 1st pick → S3

 2nd pick → S5

 3rd pick → S3 (again, because replacement allows repeats)

 4th pick → S1

 5th pick → S2

Training set = {S3, S5, S3, S1, S2}


(Notice S3 is repeated, S4 missing)

Test set = {S4}


(Anything not picked goes to test set)

4. Lazy vs. Eager Learners

Eager Learner Example

 Suppose we use Decision Tree to classify fruits.

 Training data: [Apple = red+round, Banana = yellow+long, Orange = orange+round].

 The tree learns general rules:

o If color=yellow → Banana

o If color=red & round → Apple

o If color=orange & round → Orange

 ✅ Model is ready → Quick prediction.

 ❌ Training takes longer.

Lazy Learner Example (KNN)

 Training data: same fruit dataset.

 New test fruit: round, red.

 KNN directly compares with training set (doesn’t build rules).

 Finds nearest neighbors → predicts “Apple”.

 ✅ Training is instant.

 ❌ Prediction is slow (must compare each time).

MODEL REPRESENTATION AND INTERPRETABILITY


1. Bias
When we implementing statistical model /model ml we are facing 2 type of
error is bias and variance. Bias ,variance are cause, underfit,overfit are
outcomes
Error because the model is too simple and fails to capture important patterns.
1. Imagine you are recording a song.
The real music has drums, guitar, and vocals (complex pattern).
But you use a very cheap microphone that only records in mono and cuts off high/low
frequencies.
Result: The recording captures only a basic flat version of the song → many details
(pattern) are missed.

2. Graph/Math Example
 Suppose true data is a curve (quadratic).
 You try to fit a straight line.
 Model misses the curve → poor performance on both training and test.
Why it matters?
o Model cannot learn the real trend.
o Errors stay high even if we give more data.
o Leads to underfitting.

2.Variance Error because the model is too complex, fitting even random
noise & performs badly on new data.

 When it happens?
o Model is too complex.
o Too many parameters/features.
o Not enough regularization.
o Small training dataset (model memorizes instead of
generalizing).
Real-life (Music Example)
 Using a super-sensitive mic that records every cough, fan noise,
or chair movement.
 You get the song plus noise.
 Playback is messy, doesn’t sound good outside studio.
Works well on training but fails badly on test.
 Effect: Model captures noise instead of general trend.

3.A Underfitting In Machine Learning


🔹 Definition:
Underfitting happens when a model is too simple to capture the hidden
patterns of the data.
It fails both on:
 Training data ❌ (cannot learn enough), and
 Test data ❌ (cannot generalize).

🔹 Causes of Underfitting
1. Too simple model → e.g., Linear regression for non-linear curved data.
2. Too few features → Important variables not included.
3. Insufficient training time → Model not trained properly.
4. Too much regularization → Restrictions stop the model from learning.
5. Not enough training data → Model doesn’t see enough examples to
learn.

🔹 Solution to Underfitting
✅ Use more complex models (e.g., polynomial regression, decision trees).
✅ Add more features.
✅ Reduce regularization.
✅ Train for longer time.
✅ Use more training data (if available).

4. A Overfitting
🔹 Definition
Overfitting happens when a model learns the training data too well — including noise and
outliers.
 It performs excellent on training data ✅
 But poor on unseen/test data ❌

🔹 Cause (Why it happens?)


1. Model is too complex (too many parameters/features).
2. Small training dataset → model memorizes instead of generalizing.
3. Too many training iterations (learning for too long).
4. No regularization (no penalty for complexity).

🔹 Solution (How to avoid?)


1. Use cross-validation (k-fold).
2. Keep aside a validation set to monitor performance.
3. Simplify the model (reduce parameters, prune decision trees).
4. Apply regularization (L1, L2, Dropout in neural networks).
5. Collect more training data.

🔹 Exam Answer (Short form)


Overfitting occurs when the model fits the training data too closely, including noise and
outliers.
 Causes: complex model, small data, no regularization.
 Effect: high training accuracy, low test accuracy (poor generalization).
 Avoided using: cross-validation, validation set, regularization, pruning, more data.
EVALUATING PERFORMANCE OF A MODEL
Supervised Learning – Classification (Theory +
Example)
1. Classification in Supervised Learning
 Goal: Assign a class label (Win/Loss, Malignant/Benign,
Spam/Not Spam, etc.)
 Uses predictor features → (e.g., toss result, no. of spinners,
past wins).
 Model is evaluated by comparing predictions with actual
outcomes.
 Confusion Matrix A confusion matrix is a table that shows how well a
classification model performs.
 It compares the actual labels (ground truth) with the predicted labels (model
output).
 Mainly used for classification problems (binary or multi-class).

Imagine a Cricket Prediction Model


We built a model to predict whether India will Win or Lose a match.
👉 Sometimes it predicts correct ✅
👉 Sometimes it predicts wrong ❌
We need a way to check how good this model is → that’s where
Confusion Matrix comes.

2. Confusion Matrix – Like a Scoreboard


Predicted Win Predicted Loss
✅ TP (True Positive) → model ❌ FN (False Negative) →
Actual
said Win & India actually model said Loss but India
Win
Won actually Won
❌ FP (False Positive) → ✅ TN (True Negative) →
Actual
model said Win but India model said Loss & India
Loss
actually Lost actually Lost
👉 So:
 TP = Correct Win prediction
 TN = Correct Loss prediction
 FP = Wrongly said Win
 FN = Wrongly said Loss

3. Key Performance Metrics


(a) Accuracy
Proportion of all correct predictions.
 Good for balanced datasets, but misleading when data is
imbalanced.
 Example: Predicting disease where 99% people are healthy →
model says “healthy” for all → 99% accuracy but useless.
(f) Recall (same as Sensitivity)
Already defined.

(g) Kappa Statistic (κ)


 Adjusted accuracy → removes “chance agreement”.
 κ = 1 → Perfect, κ = 0 → Random.

Example Numbers (100 Matches)


 India actually won 87 times.
 India actually lost 13 times.
Model prediction results:
 TP = 85 (correct Win predicted)
 FN = 2 (missed Win)
 FP = 4 (wrongly said Win)
 TN = 9 (correct Loss predicted)

4. Now Formulas (Think Simple)


 Accuracy = How often model is right
(TP+TN)/Total(TP + TN) / Total(TP+TN)/Total =
(85+9)/100 = 94%
 Error Rate = How often model is wrong
(FP+FN)/Total(FP + FN) / Total(FP+FN)/Total =
(4+2)/100 = 6%
 Sensitivity / Recall = Out of all actual Wins, how
many did we catch?
(TP/(TP+FN))(TP / (TP+FN))(TP/(TP+FN)) = 85/87 =
97.7%
 Specificity = Out of all actual Losses, how many did
we catch?
(TN/(TN+FP))(TN / (TN+FP))(TN/(TN+FP)) = 9/13 =
69.2%
 Precision = Out of predicted Wins, how many were
correct?
(TP/(TP+FP))(TP / (TP+FP))(TP/(TP+FP)) = 85/89 =
95.5%

5. When to Use Which Metric?


 Medical case (Tumor detection) → Focus on
Sensitivity (don’t miss a patient with tumor).
 Spam filtering → Focus on Precision (don’t
wrongly block important emails).
 General cases → Accuracy, Precision, Recall all
matter.

✅ Memory Shortcuts:
 Recall = "Did we catch all the real Wins?"
 Precision = "Are predicted Wins mostly correct?"
 Specificity = "Did we catch the real Losses
correctly?"
The Trade-Off: Precision vs. Recall (The Most Important
Concept)
You can't always have perfect both. Improving one
often hurts the other.
 High Precision, Lower Recall: A conservative
model. It only predicts "Positive" when
it's extremely sure.
o Spam Example: If you set the filter to be very
strict (high Precision), you'll have very few
non-spam emails in Spam (low FP), but you'll
also let more spam slip into your inbox (high
FN → low Recall).
 High Recall, Lower Precision: An aggressive
model. It tries to catch all positives, even if it
means making some mistakes.
o Spam Example: If you set the filter to
catch every single spam (high Recall), it will
also be quick to mark suspicious emails as
spam, risking that some important emails get
caught (high FP → low Precision).
So we cannot use confusion matrix (that’s only for
classification).
👉 Instead, we check how far predicted values are from
actual values.

2. Key Metrics in Regression


3. Example (House Price Prediction)
Suppose actual vs predicted house prices (in ₹ lakhs):
House Actual (y) Predicted (ŷ)
1 50 48
2 60 65
3 40 42
4 55 50
Step 1: Errors
Error = Actual – Predicted
 House1 → 50 – 48 = 2
 House2 → 60 – 65 = –5
 House3 → 40 – 42 = –2
 House4 → 55 – 50 = 5
Step 2: Metrics
 MAE = (|2|+|–5|+|–2|+|5|)/4 = (14)/4 = 3.5
 MSE = (2² + (–5)² + (–2)² + 5²)/4 = (4+25+4+25)/4 =
58/4 = 14.5
 RMSE = √14.5 ≈ 3.8
 R² → tells how much variance is explained. If R² =
0.90 → 90% accuracy in explaining variation.
4. When to Use Which?
 MAE → Simple, average error size.
 MSE / RMSE → When large errors must be heavily
penalized.
 R² → For overall model quality (goodness of fit).

✅ Shortcut memory:
 MAE = Mean of |errors|: Average of the absolute
difference between Actual value and predicted
value
 MSE = Mean of (errors²): Average of the square
difference b/w Actual and predicted value
 RMSE = Square root of MSE
 R² = How much variation explained by model
Understand purpose what MAE means: they tell us how far predictions are from
actual values.
 MAE = average mistake size
 Example: MAE = 50 → on average, your prediction is off by 50 units.

🔹 Step 2: Why the target scale matters


Imagine two cases:
1. House Price (₹10,00,000)
o Error = 50
o Relative error = 50/10,00,000×100=0.005% → excellent.
o Means your predictions are almost perfect.
2. Product Price (₹300)
o Error = 50
o Relative error = 50/300×100=16.7%→ poor.
o Losing ~17% accuracy is not acceptable.
👉 That’s why we don’t just look at “50”; we look at “50 compared to what”

Feature engineering
Feature
 A feature = attribute (column) of a dataset used in
ML.
 Features = dataset dimensions.
Example: Iris dataset has 5 features: Sepal.Length,
Sepal.Width, Petal.Length, Petal.Width, Species.
→ Predictor features = first 4
→ Class feature = Species

🔹 Feature Engineering
 Process of creating, selecting, or transforming
features to improve model performance.
 Two main parts:
1. Feature Transformation: It means changing
existing features into a new form so that the
model can understand and learn better.
--It does NOT create new information, but
represents data in a better way.
Feature transformation transforms the data –
structured or unstructured, into a new set of features
Two types:
A. Feature Construction = create new
features from old ones (adds
dimensions).
Example: Apartment length + breadth
→ construct new feature Area =
length × breadth.
B. Feature Extraction is the process of
extracting or creating a new set of features from
the original set of features using some
functional mapping.

Feature Selection (Subset Selection)


 No new feature is made.
 Just pick the most important features
from all.
 Example: From 100 features, maybe only
20 matter → keep those.

You might also like