STUDENT HANDOUT - 3
Machine Learning Fundamentals
Topic Name: Machine Learning Fundamentals
Includes: Supervised Learning • Unsupervised Learning • Reinforcement Learning • Model
Evaluation Metrics • Overfitting & Underfitting
🤖 What is Machine Learning?
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables systems to learn from
data, identify patterns, and make decisions or predictions with minimal human intervention.
Instead of being explicitly programmed, ML models "learn" from data, allowing them to adapt and
improve their performance over time.
🔍 Types of Machine Learning
Machine Learning broadly categorizes into three main types based on the nature of the learning
signal or feedback available to the learning system:
📌 Supervised Learning
Supervised learning is a type of machine learning where the model learns from a labeled dataset,
meaning the training data includes both input features and their corresponding correct output
labels. The goal is for the model to learn a mapping from inputs to outputs so it can predict outputs
for new, unseen data.
● How it works: The model is "supervised" by knowing the correct answers during training.
It learns to minimize the difference between its predictions and the actual labels.
● Common Tasks:
○ Classification: Predicting a categorical label (e.g., spam detection, image
recognition of cats vs. dogs).
○ Regression: Predicting a continuous numerical value (e.g., house price prediction,
stock market forecasting).
● Examples: Image classification, email spam detection, predicting customer churn.
📌 Unsupervised Learning
Unsupervised learning deals with unlabeled data. In this approach, the model tries to find hidden
patterns, structures, or relationships within the input data without any prior knowledge of output
labels. The goal is to explore the data and discover inherent groupings or features.
● How it works: The model infers patterns directly from the input data, often by looking for
similarities or differences.
● Common Tasks:
○ Clustering: Grouping similar data points together (e.g., customer segmentation,
document categorization).
○ Dimensionality Reduction: Reducing the number of features in a dataset while
retaining important information (e.g., PCA for feature extraction).
○ Association Rule Mining: Discovering relationships between variables in large
datasets (e.g., market basket analysis).
● Examples: Recommender systems (e.g., "customers who bought this also bought..."),
anomaly detection.
📌 Reinforcement Learning
Reinforcement Learning (RL) involves an agent learning to make sequential decisions in an
environment to maximize a cumulative reward. The agent learns through trial and error, receiving
feedback in the form of rewards or penalties for its actions. There are no labeled datasets; instead,
the agent learns by interacting with its environment.
● How it works: The agent performs an action, observes the environment's response and
receives a reward, and then adjusts its strategy to maximize future rewards.
● Key Components: Agent, Environment, States, Actions, Rewards.
● Examples: Game playing (e.g., AlphaGo), robotics, autonomous driving.
🔍 Model Evaluation Metrics
Once a machine learning model is trained, it's crucial to evaluate its performance. Different metrics
are used depending on the type of ML task:
● For Classification:
○ Accuracy: Proportion of correctly classified instances.
○ Precision: Proportion of true positive predictions among all positive predictions.
○ Recall (Sensitivity): Proportion of true positive predictions among all actual
positive instances.
○ F1-Score: Harmonic mean of precision and recall, useful for imbalanced datasets.
○ Confusion Matrix: A table showing true positives, true negatives, false positives,
and false negatives.
● For Regression:
○ Mean Absolute Error (MAE): Average of the absolute differences between
predictions and actual values.
○ Mean Squared Error (MSE): Average of the squared differences between
predictions and actual values; penalizes larger errors more heavily.
○ Root Mean Squared Error (RMSE): Square root of MSE, providing error in the
same units as the target variable.
○ R-squared (R2): Represents the proportion of variance in the dependent variable
that can be predicted from the independent variable(s).
📌 Overfitting & Underfitting
These are common problems encountered during model training:
● Overfitting: Occurs when a model learns the training data too well, including its noise and
outliers. An overfitted model performs exceptionally well on the training data but poorly on
unseen (test) data because it has essentially memorized the training examples rather than
learning generalizable patterns.
○ Symptoms: High accuracy on training data, low accuracy on test data.
○ Solutions: More training data, regularization (L1/L2), cross-validation, feature
selection, simpler models.
● Underfitting: Occurs when a model is too simple to capture the underlying patterns in the
training data. It performs poorly on both training and test data because it hasn't learned
enough from the data.
○ Symptoms: Low accuracy on both training and test data.
○ Solutions: More complex models, more relevant features, reducing regularization.
💻 Practical Application
● Supervised Learning: Building a model to predict if an email is spam (classification) or to
predict a house price based on its features (regression).
● Unsupervised Learning: Grouping customers into segments based on their purchasing
behavior or reducing the number of features in a large dataset to make it more manageable.
● Reinforcement Learning: Training an AI agent to navigate a maze or play a video game.
● Evaluation Metrics: Using accuracy, precision, or MSE to assess how well your trained
model is performing on new data.
● Addressing Overfitting/Underfitting: Diagnosing if your model is too complex or too
simple by looking at its performance on training vs. test sets and applying appropriate
techniques.
🧾 Key Takeaways
● Machine Learning allows systems to learn from data without explicit programming.
● Supervised Learning uses labeled data for tasks like classification and regression.
● Unsupervised Learning finds patterns in unlabeled data, used for clustering and
dimensionality reduction.
● Reinforcement Learning involves agents learning through trial and error to maximize
rewards in an environment.
● Model evaluation metrics are crucial for assessing model performance (e.g., accuracy,
precision, MAE).
● Overfitting occurs when a model memorizes training data, performing poorly on new data.
● Underfitting occurs when a model is too simple to capture patterns, performing poorly
overall.