Module-4
Advanced Machine Learning Techniques:
Ensemble Methods:
Ensemble Learning is a machine learning paradigm
where multiple models (often called "weak
learners") are trained and combined to solve the
same problem to achieve better performance than
any single model alone.
Here are the main ensemble learning techniques:
✅ 1. Bagging (Bootstrap Aggregating)
Goal: Reduce variance (overfitting).
How: Train multiple models on different
random subsets (with replacement) of the
training data, and average (or vote) their
predictions.
Common Algorithm:
o Random Forest (ensemble of decision trees)
Use case: High-variance models like decision
trees.
✅ 2. Boosting
Goal: Reduce bias (underfitting).
How: Train models sequentially; each model
learns from the mistakes of the previous one by
giving more weight to misclassified instances.
Popular Algorithms:
o AdaBoost (Adaptive Boosting)
o Gradient Boosting Machines (GBM)
o XGBoost, LightGBM, CatBoost (optimized
implementations)
Use case: Structured/tabular data with
moderate to large size.
✅ 3. Stacking (Stacked Generalization)
Goal: Combine multiple diverse models using a
meta-model.
How: Train different models on the training set,
then use their predictions as input features to
train a higher-level model (meta-learner).
Example:
o Level-0 models: SVM, Logistic Regression,
Random Forest
o Level-1 model (meta-model): Logistic
Regression
✅ 4. Voting
Goal: Simple combination of predictions from
different models.
Types:
o Hard Voting: Majority class wins.
o Soft Voting: Average predicted probabilities
(requires probability outputs).
Use case: When you have a mix of models and
want a quick ensemble.
✅ 5. Blending
Similar to stacking but uses a hold-out
validation set instead of cross-validation to train
the meta-model.
Faster but may lead to overfitting if the
validation set is too small.
Gradient Boosting Machines (GBM), Extreme
Gradient Boosting (XGBoost).
Ensemble Methods: Bagging and Boosting:
Ensemble methods are machine learning techniques
that combine predictions from multiple models to
improve performance, reduce variance, and prevent
overfitting. Two of the most popular ensemble
methods are Bagging and Boosting.
1. Bagging (Bootstrap Aggregating)
Handles variance or over fitting.
Key Idea: Train multiple models independently on
different random subsets of the data and aggregate
their predictions (e.g., by voting or averaging).
Process:
1. Generate multiple bootstrap samples
(random samples with replacement) from
the training data.
2. Train a separate model (usually the same
type, like decision trees) on each sample.
3. Combine predictions:
Classification: majority vote
Regression: average
Goal: Reduce variance and improve model
stability.
Example Algorithms:
o Random Forest (Bagging applied to decision
trees, with additional randomness in feature
selection)
Advantages:
o Handles overfitting well (especially for high-
variance models).
o Easy to parallelize.
Disadvantages:
o Doesn’t reduce bias (Under fitting)
significantly.
o Large ensembles can be slow in prediction
time.
2. Boosting
It is used to handle bias or under fitting.
Key Idea: Train models sequentially, where each
new model focuses on correcting the errors of the
previous ones.
Process:
1. Start with an initial weak model (e.g., a
shallow decision tree).
2. Train the next model to focus more on data
points that were mispredicted by the
previous model.
3. Combine the models using a weighted
majority or sum of predictions.
Goal: Reduce bias and build a strong predictive
model from weak learners.
Popular Boosting Algorithms:
o AdaBoost (Adaptive Boosting)
o Gradient Boosting Machines (GBM)
o XGBoost
Advantages:
Often achieves higher accuracy than bagging,
especially on structured/tabular data.
Can handle bias better.
Disadvantages:
More prone to overfitting if not carefully
regularized.
Sequential process is harder to parallelize.
More sensitive to noisy data and outliers.
Gradient Boosting Machines (GBM)
Gradient Boosting Machines (GBM) are a powerful
ensemble machine learning technique used for both
regression and classification tasks.
GBM builds models in a sequential, stage-wise
fashion by combining multiple weak learners,
typically decision trees, into a strong predictive
model.
Core Idea of GBM
GBM minimizes a loss function by adding models
(e.g., trees) that correct the errors (residuals) made
by previous models.
The algorithm uses gradient descent to optimize the
model step-by-step.
How GBM Works (Step-by-Step)
1. Initialize the model:
o Start with a simple model (e.g., mean of
targets for regression).
2. Compute residuals:
o Calculate the difference between the actual
and predicted values.
3. Fit a weak learner:
o Train a shallow decision tree (often with
depth 1–5) on the residuals.
4. Update the model:
o Add the new tree's predictions to the
existing model with a learning rate η.
o New prediction:
Fm(x)=Fm−1(x)+η⋅hm(x)
5. Repeat:
Continue this process for a specified number of
iterations or until convergence.
Key Components
Loss function: Determines what the model is
optimizing (e.g., MSE for regression, log-loss for
classification).
Weak learner: Usually decision trees.
Learning rate η: Controls the contribution of
each tree.
Number of trees: More trees can improve
accuracy but increase training time and risk
overfitting.
Subsampling: Often used (as in Stochastic
GBM) to improve generalization and reduce
overfitting.
Advantages
Handles non-linear relationships well.
Highly accurate and flexible.
Can work with different types of loss functions.
Supports both classification and regression.
Disadvantages
Computationally intensive (training is slower).
Can overfit if not properly tuned.
Less interpretable than linear models.
Example of a Gradient Boosting
Machine (GBM)
Problem: Predict whether a customer will buy a
product (binary classification)
We’ll use the popular sklearn.datasets for a
synthetic dataset.
Code Example (Gradient Boosting
Classifier):
python
from sklearn.datasets import make_classification
from sklearn.ensemble import
GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 1. Generate a synthetic binary classification
dataset
X, y = make_classification(n_samples=1000,
n_features=20, n_informative=15, n_redundant=5,
random_state=42)
# 2. Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.3, random_state=42)
# 3. Initialize the Gradient Boosting Classifier
gbm = GradientBoostingClassifier(n_estimators=100, # number
of boosting stages
learning_rate=0.1, # shrinkage rate
max_depth=3, # depth of each tree
random_state=42)
# 4. Fit the model
gbm.fit(X_train, y_train)
# 5. Predict on test data
y_pred = gbm.predict(X_test)
# 6. Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
Extreme Gradient Boosting (XGBoost):
Extreme Gradient Boosting (XGBoost) is a powerful
and efficient implementation of the gradient
boosting framework.
It's widely used in machine learning competitions
(like those on Kaggle) and industry applications due
to its high performance, scalability, and ability to
handle various data types and problem settings.
What is XGBoost?
XGBoost stands for Extreme Gradient Boosting.
It is a decision-tree-based ensemble Machine
Learning algorithm that uses a gradient boosting
framework.
Developed by Tianqi Chen, it aims to be highly
efficient, flexible, and portable.
How It Works
XGBoost builds an ensemble (collection) of decision
trees, where each new tree attempts to correct the errors
of the previous ones.
1. Initial Prediction: Start with an initial model (like
predicting the mean).
2. Compute Residuals: Find the difference between
predicted and actual values.
3. Train a Tree: Train a tree to predict the residuals
(errors).
4. Update Model: Add the predictions from the new
tree to improve the model.
5. Repeat: Continue adding trees until stopping
criteria are met.
This is similar to traditional gradient boosting but with
significant performance improvements.
Key Features of XGBoost
Feature Description
Helps prevent overfitting using
Regularization L1 (Lasso) and L2 (Ridge)
penalties.
Speeds up training using multi-
Parallel Processing
core processing.
Uses max depth rather than max
Tree Pruning
number of nodes (pre-pruning).
Handling Missing Automatically learns the best way
Data to handle missing values.
Custom Loss Allows user-defined objective
Functions functions.
Optimized for sparse (missing)
Sparsity Awareness
input data.
Built-in support for k-fold cross-
Cross-validation
validation.
Feature Description
Parameter Description
n_estimators Number of boosting rounds (trees).
max_depth Maximum depth of a tree.
learning_rate Step size shrinkage (also called eta).
subsample Fraction of data used per tree.
colsample_bytree Fraction of features used per tree.
gamma Minimum loss reduction required to make a split.
lambda, alpha L2 and L1 regularization on weights.
How to Use XGBoost in Python
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Load data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.2)
# Train model
model =
xgb.XGBRegressor(objective='reg:squarederror',
n_estimators=100)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Introduction to Support Vector Machines (SVM)
Support Vector Machine (SVM) is a supervised
machine learning algorithm used for classification,
regression, and even outlier detection. It is particularly
well-known for its effectiveness in binary
classification problems.
Key Concepts of SVM:
1. Hyperplane
A decision boundary that separates data points from
different classes.
In 2D, it's a line; in 3D, a plane; in higher
dimensions, it's called a hyperplane.
SVM aims to find the optimal hyperplane that
best separates the data.
2. Support Vectors
The data points closest to the hyperplane.
These points are critical because they directly
influence the position and orientation of the
hyperplane.
Removing them would change the decision
boundary.
3. Margin
The distance between the hyperplane and the
nearest data points (support vectors) on either side.
SVM seeks to maximize this margin for better
generalization.
4. Linearly Separable vs Non-linearly Separable
Data
For linearly separable data, SVM can find a
straight hyperplane.
For non-linearly separable data, SVM uses a
kernel trick to project data into higher dimensions
where a linear separator can exist.
The Kernel Trick:
Kernels allow SVM to operate in a higher-dimensional
space without explicitly computing the transformation.
Common kernels include:
Kernel Type Function
K(x, x') = x⊤x
Linear
′
Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)
How SVM Works :
1. Input labeled training data (e.g., cats vs dogs).
2. Choose a kernel function (linear or non-linear).
3. Find the optimal hyperplane that maximizes the
margin between classes.
4. Classify new data based on which side of the
hyperplane they fall on.
Linear Support Vector Machine (Linear SVM):
A Linear Support Vector Machine (Linear SVM) is a
supervised machine learning algorithm used primarily
for binary classification tasks. It tries to find the best
separating hyperplane (a straight line in 2D, a plane in
3D, etc.) that maximizes the margin between two
classes.
Key Concepts
1. Hyperplane:
o A decision boundary that separates classes.
o In 2D: a line; in 3D: a plane; in higher
dimensions: a hyperplane.
2. Margin:
o The distance between the hyperplane and the
closest data points from each class.
o A larger margin generally indicates a better
generalization to unseen data.
3. Support Vectors:
o The data points that are closest to the
hyperplane.
o They "support" or define the position and
orientation of the hyperplane.
4. Linear Kernel:
o Linear SVM assumes that data is linearly
separable, meaning a straight line (or
hyperplane) can separate the classes.
Objective Function
The optimization problem is:
Minimize:
1/2∥w∥2
Subject to:
yi(w⋅xi+b)≥1for all i
Where:
w = weight vector (normal to the
hyperplane)
b = bias term
xi = feature vector for sample i
yi = label (+1 or -1)
Common Uses
Text classification (e.g., spam detection)
Image classification
Bioinformatics (e.g., cancer detection)
Non-Linier Support Vector Machine:
A non-linear Support Vector Machine
(SVM) is an extension of the basic (linear)
SVM that can handle data that is not linearly
separable by transforming it into a higher-
dimensional space where a linear separation is
possible.
An SVM is a supervised machine learning
algorithm used for classification and
regression. It works by finding the optimal
hyperplane that separates data points of
different classes with the maximum margin.
Why Non-Linear?
In many real-world problems, the data is not
linearly separable. A straight line (in 2D) or
hyperplane (in higher dimensions) cannot
divide the classes perfectly.
✅ Example:
Imagine trying to classify this:
Class A (circles): ⭕
Class B (crosses): ❌
markdown
⭕ ❌
⭕
❌ ⭕
❌
⭕ ❌
No straight line can separate them well — so
we need to go non-linear.
How It Works: The Kernel Trick
Instead of explicitly computing the
coordinates in higher dimensions, SVM uses a
kernel function to compute the dot product
in that space.
This allows the algorithm to fit the maximum-
margin hyperplane in a transformed feature
space.
Kernel Type Function
K(x, x') = x⊤x
Linear
′
Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)
Real-World Applications
Image classification
Handwriting recognition (e.g., MNIST)
Bioinformatics (e.g., gene classification)
Text classification (e.g., spam detection)
Neural Networks and Deep Learning:
Introduction to Neural Networks:
Neural networks are a class of algorithms
inspired by the structure and functioning of the
human brain.
They form the foundation of many modern
artificial intelligence (AI) systems,
particularly in areas such as image
recognition, natural language processing,
and game playing.
What Is a Neural Network?
A neural network is a computational model
made up of layers of interconnected nodes (or
"neurons") that can learn to recognize patterns
from data
Biological Analogy:
Neurons: Like brain cells, each node
processes input and passes on its output.
Synapses: Connections between nodes,
each with a weight representing its
strength.
Learning: Adjusting the weights to
improve predictions (akin to learning from
experience).
Components of a Neural Network
1. Input Layer
o Receives the input features (e.g., pixel
values of an image, text data).
2. Hidden Layers
o Perform computations and
transformations.
o Each node applies an activation
function to its inputs (like ReLU,
sigmoid).
3. Output Layer
o Produces the final prediction (e.g.,
class label, value).
How Does It Work?
The neural network processes input data in the
following steps:
1. Forward Propagation
o Data moves from input → hidden
layers → output.
o Each neuron computes a weighted sum
of inputs and passes it through an
activation function.
2. Loss Calculation
o The output is compared with the true
value using a loss function (e.g., Mean
Squared Error, Cross-Entropy).
3. Backpropagation
o The network calculates the gradient of
the loss with respect to each weight
using calculus (chain rule).
o Uses this to update the weights
(usually via gradient descent).
4. Types of Neural Networks
Type Use Case
Feedforward Neural Basic tasks (e.g., regression,
Network classification)
Convolutional Neural
Image and video recognition
Network (CNN)
Recurrent Neural Time-series or sequential
Network (RNN) data (e.g., language)
Modern NLP models (e.g.,
Transformer Networks
ChatGPT)
Example Use Cases
Image Classification (e.g., recognizing cats vs
dogs)
Speech Recognition
Machine Translation
Medical Diagnosis
Stock Price Prediction
Why Are Neural Networks Powerful?
Can model nonlinear and complex relationships
Perform end-to-end learning from raw data
Scalable to large datasets and tasks
Building and training neural networks
using TensorFlow and Kera’s:
TensorFlow framework:
TensorFlow is an open-source machine
learning and deep learning framework
developed by Google.
It allows developers and researchers to build
and train machine learning models,
particularly neural networks, for a wide
variety of tasks such as:
Image recognition
Natural language processing
Time series prediction
Reinforcement learning
And many more
Key Features of TensorFlow:
Flexible architecture: Can run on CPUs,
GPUs, and TPUs (specialized hardware).
High-level APIs: Such as Keras, which
makes it easier to build and train models.
Scalable: Can run on everything from a
single smartphone to a large-scale
distributed system.
Support for production: TensorFlow
Serving, TensorFlow Lite, and
TensorFlow.js make it usable on servers,
mobile devices, and the web.
Example Use Case:
python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# Simple neural network for classification
model = Sequential([
Dense(64, activation='relu', input_shape=(100,)),
Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
Keras Framework:
Keras is an open-source deep learning framework that
makes it easier to build and train neural networks.
It provides a user-friendly, high-level API designed to
work with TensorFlow, which is a more complex backend
framework for numerical computation.
Key Points about Keras:
High-level API: Keras is designed for human beings,
not machines.
It prioritizes ease of use, simplicity, and fast
prototyping.
Runs on top of TensorFlow: As of 2020, Keras is
tightly integrated into TensorFlow and is officially
part of it (tf.keras).
Modular and Extensible: You can build complex
neural network models using a modular approach —
layers, optimizers, loss functions, and metrics are all
easily configurable.
Used for deep learning: Applications include image
classification, natural language processing, time series
forecasting, etc.
Example:
Here's a simple Keras model for binary
classification:
python
import tensorflow as tf
from tensorflow.keras import layers, models
model = models.Sequential([
layers.Dense(64, activation='relu', input_shape=(10,)),
layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
# Example data
import numpy as np
x_train = np.random.rand(1000, 10)
y_train = np.random.randint(0, 2, size=(1000,))
model.fit(x_train, y_train, epochs=10, batch_size=32)
This trains a simple neural network with:
1 hidden layer of 64 neurons (ReLU activation)
1 output layer (sigmoid for binary output)
TensorFlow, activation functions:
In TensorFlow, activation functions are crucial
components of neural networks that introduce non-
linearity, allowing the network to learn complex
patterns.
Here's a list of commonly used activation functions
in TensorFlow (tf.nn or tf.keras.activations), along
with brief descriptions:
Common Activation Functions
Activation Description TensorFlow Function
Sets all negative
ReLU
values to 0. Most tf.nn.relu(x) or
(Rectified
common due to tf.keras.activations.re
Linear
simplicity and lu(x)
Unit)
efficiency.
Squashes input to
tf.nn.sigmoid(x) or
range (0, 1). Often
Sigmoid tf.keras.activations.si
used in binary
gmoid(x)
classification.
Squashes input to
Tanh tf.nn.tanh(x) or
(-1, 1). Zero-
(Hyperboli tf.keras.activations.ta
centered unlike
c Tangent) nh(x)
sigmoid.
Activation Description TensorFlow Function
Converts logits
into probabilities
tf.nn.softmax(x) or
that sum to 1.
Softmax tf.keras.activations.so
Used in multi-
ftmax(x)
class classification
(last layer).
Variant of ReLU
Leaky that allows a small tf.nn.leaky_relu(x,
ReLU gradient when alpha=0.2)
inactive.
ELU Similar to ReLU,
(Exponenti but smoother and
tf.nn.elu(x)
al Linear has negative
Unit) outputs.
Self-normalizing
SELU variant of ELU.
(Scaled Used in self- tf.nn.selu(x)
ELU) normalizing
networks.
GELU
Smooth
(Gaussian
approximation of
Error tf.nn.gelu(x)
ReLU; used in
Linear
Transformers.
Unit)
x * sigmoid(x);
tf.nn.swish(x) or
often better than
Swish tf.keras.activations.s
ReLU in deeper
wish(x)
models.
🔹 How to Use Them in Models
With tf.keras.layers.Dense:
python
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Or with the functional API:
python
x = tf.keras.Input(shape=(10,))
y = tf.keras.layers.Dense(32, activation=tf.nn.elu)(x)
Example:
TesorFlow.js
A JavaScript Library for
Training and Deploying
Machine Learning Models
In the Browser
Tensorflow Models
Models and Layers are important building blocks
in Machine Learning.
For different Machine Learning tasks you must
combine different types of Layers into a Model that
can be trained with data to predict future values.
TensorFlow.js is supporting different types
of Models and different types of Layers.
A TensorFlow Model is a Neural Network with one
or more Layers.
A Tensorflow Project
A Tensorflow project has this typical workflow:
Collecting Data
Creating a Model
Adding Layers to the Model
Compiling the Model
Training the Model
Using the Model
Example
Suppose you knew a function that defined a strait
line:
Y = 1.2X + 5
Then you could calculate any y value with the
JavaScript formula:
y = 1.2 * x + 5;
To demonstrate Tensorflow.js, we could train a
Tensorflow.js model to predict Y values based on X
inputs.
Note
The TensorFlow model does not know the function.
const xs = tf.tensor([0, 1, 2, 3, 4]);
const ys = xs.mul(1.2).add(5);
// Define a Linear Regression Model
const model = tf.sequential();
model.add(tf.layers.dense({units:1,
inputShape:[1]}));
// Specify Loss and Optimizer
model.compile({loss:'meanSquaredError',
optimizer:'sgd'});
// Train the Model
model.fit(xs, ys, {epochs:500}).then(() =>
{myFunction()});
// Use the Model
function myFunction() {
const xMax = 10;
const xArr = [];
const yArr = [];
for (let x = 0; x <= xMax; x++) {
let result =
model.predict(tf.tensor([Number(x)]));
result.data().then(y => {
xArr.push(x);
yArr.push(Number(y));
if (x == xMax) {plot(xArr, yArr)};
});
}
}
TensorFlow.js
What is Keras?
Keras is a deep learning API that simplifies the
process of building deep neural networks.
Initially it was developed as an independent library,
Keras is now tightly integrated into TensorFlow as
its official high-level API.
It supports multiple backend engines like
TensorFlow, Theano and Microsoft Cognitive Toolkit
(CNTK).
Keras makes it easier to train and evaluate deep
learning models without requiring extensive
knowledge of low-level operations.
How to Install Keras?
1. Since Keras is now part of TensorFlow, it can be
installed easily using pip:
pip install tensorflow
This command installs TensorFlow 2.x, which
includes Keras.
2. To check the installation, open Python and run:
import tensorflow as tf
print('TensorFlow Version: ', tf.__version__)
print('Keras Version ', tf.keras.__version__)
Output:
TensorFlow Version: 2.18.0
Keras Version 3.8.0
How to Build a Model in Keras?
Keras provides two main ways to build models:
1. Sequential API
2. Functional API
The Sequential API are easy to work with models
with a single input and output and a linear stack of
layers.
Whereas, the Functional API can be used for models
that require multiple inputs and outputs or layers
have multiple inputs or outputs.
Building Model using Sequential API
Here’s how you can define a Sequential model:
We create a Sequential model.
Add a fully connected (Dense) layer with 64
units and ReLU activation.
Add another Dense layer with 10 units (for
classification) and a Softmax activation.
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))
Building Model using Functional API
Functional API allows more flexibility in creating
complex architectures. You can create models with
shared layers, multiple inputs/outputs and skip
connections.
For example:
We define two input layers (input1 and input2).
Create separate hidden layers for each input.
Merge the hidden layers using the concatenate
function.
Finally, add an output layer with SoftMax
activation.
from keras.layers import Input, Dense, concatenate
from keras.models import Model
input1 = Input(shape=(100,))
input2 = Input(shape=(50,))
hidden1 = Dense(64, activation='relu')(input1)
hidden2 = Dense(32, activation='relu')(input2)
merged = concatenate([hidden1, hidden2])
output = Dense(10, activation='softmax')(merged)
model = Model(inputs=[input1, input2],
outputs=output)
Example of keras API:
Here's a simple example of using the Keras API (a
high-level API of TensorFlow) to build, compile, and
train a neural network for classifying digits from the
MNIST dataset (handwritten digits 0–9). This is a
classic example in machine learning.
Keras API Example: MNIST Digit Classifier
python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) =
mnist.load_data()
# Normalize the input data (scale pixels from 0–255
to 0–1)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
# Flatten the images from 28x28 to 784 (for fully
connected network)
x_train = x_train.reshape((x_train.shape[0], 28 *
28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))
# One-hot encode the labels (e.g., 3 becomes
[0,0,0,1,0,0,0,0,0,0])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
# Build the neural network model using Keras
Sequential API
model = models.Sequential([
layers.Dense(512, activation='relu',
input_shape=(784,)),
# Hidden layer with 512 neurons
layers.Dense(10, activation='softmax')
# Output layer with 10 neurons (for 10 classes)
])
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(x_train, y_train, epochs=5,
batch_size=128, validation_split=0.1)
# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(x_test,
y_test)
print(f'Test accuracy: {test_accuracy:.4f}')
# 7. Make predictions
predictions = model.predict(x_test)
# 8. Show some predictions
import numpy as np
for i in range(5):
plt.imshow(x_test[i], cmap='gray')
plt.title(f"Predicted: {np.argmax(predictions[i])},
Actual: {y_test[i]}")
plt.show()
🧠 Key Concepts:
Sequential API: Used to build models layer-by-
layer.
Dense layer: Fully connected neural network
layer.
Activation functions: ReLU for hidden layers,
softmax for multi-class output.
Loss function: categorical_crossentropy for
multi-class classification.
Optimizer: adam, a widely used gradient
descent optimizer.
The MNIST dataset (Modified National Institute of Standards and
Technology database) is a benchmark dataset in machine learning
and computer vision, especially used for image classification
tasks.
📚 Overview of the MNIST Dataset
Feature Description
Type Handwritten digit images (0 to 9)
Number of classes 10 (digits 0 through 9)
Image size 28 x 28 pixels (grayscale)
Number of
60,000
training samples
Number of test
10,000
samples
Each image is a 28×28 pixel grayscale
Data format
image (784 features if flattened)
Label format A single digit (0–9)
Example Images from MNIST
Each image looks like a small, pixelated version of a handwritten
digit. Here's a visual example (you'd need to plot it in Python to
see):
python
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist
(x_train, y_train), _ = mnist.load_data()
plt.imshow(x_train[0], cmap='gray')
plt.title(f"Label: {y_train[0]}")
plt.show()
Convolutional Neural Networks (CNNs):
Convolutional Neural Networks (CNNs) are a class
of deep neural networks that are particularly
effective for processing data with a grid-like
topology, such as images.
They're most commonly used in computer vision
tasks like image classification, object detection,
segmentation, and more.
Key Concepts of CNNs
1. Convolutional Layer
The core building block of a CNN.
Applies filters (kernels) that slide over the input
data to detect features like edges, textures, or
shapes.
Each filter produces a feature map (activation
map) that highlights where that feature occurs.
2. ReLU (Rectified Linear Unit)
Applies a non-linear activation function to
introduce non-linearity.
Commonly: ReLU(x) = max(0, x)
3. Pooling Layer
Down samples the feature maps to reduce
dimensionality and computation.
Common types:
o Max Pooling: takes the maximum value in
each window.
o Average Pooling: takes the average.
4. Fully Connected (Dense) Layer
After several convolution and pooling layers, the
output is flattened and passed through one or
more dense layers for classification or regression
tasks.
5. Softmax / Sigmoid (Output Layer)
Used in the final layer depending on the task:
o Softmax for multi-class classification.
o Sigmoid for binary classification.
Typical CNN Architecture
Input Image → [Conv → ReLU → Pool]*N →
Flatten → Dense → Output
Example: Image Classification
For a handwritten digit recognition task (like
MNIST), a CNN might:
Detect edges in the first layer.
Identify digit parts (like curves or lines) in
deeper layers.
Combine these to classify the final digit.
Libraries to Use CNNs
TensorFlow / Keras
PyTorch
OpenCV (for pre-processing and some model
loading)
CNN Example in Python
(TensorFlow/Keras):
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
# Load and preprocess the MNIST dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) =
mnist.load_data()
# Normalize pixel values to 0-1 and reshape for CNN
input
x_train = x_train.reshape(-1, 28, 28,
1).astype("float32") / 255
x_test = x_test.reshape(-1, 28, 28,
1).astype("float32") / 255
# Build the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax') # 10
classes for digits 0-9
])
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
history = model.fit(x_train, y_train, epochs=5,
validation_split=0.1)
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
What This Does:
Conv2D: Applies filters to detect features.
MaxPooling2D: Reduces spatial dimensions.
Flatten: Converts 2D data to 1D for the dense layer.
Dense: Fully connected layers for classification.
Recurrent Neural Networks (RNN).
What is an RNN?
A Recurrent Neural Network (RNN) is a
type of neural network designed to work with
sequential data, such as:
Text
Time series (like stock prices)
Audio
Video frames
RNNs remember past information using
loops in the network, which makes them good
at modeling sequences and time dependencies.
How Do RNNs Work?
Unlike traditional neural networks (like
feedforward networks), RNNs have memory
of previous inputs.
For a sequence of inputs:
x₁, x₂, x₃, ..., xₜ
An RNN processes each input step by step:
At time t:
Input: xₜ
Hidden state: hₜ = f(Wxₜ + Uhₜ₋₁ + b)
Output: yₜ = g(hₜ)
hₜ: Hidden state (memory of the
network)
Wxₜ: Input weight
Uhₜ₋₁: Previous hidden state
f: Activation function (e.g., tanh or
ReLU)
g: Output function
This loop allows RNNs to carry information
from earlier in the sequence to later steps.
Applications of RNNs
Language modeling and generation
(e.g., GPT-style models started with RNN
ideas)
Machine translation
Speech recognition
Stock price prediction
Music generation
Example of a Recurrent Neural Network
(RNN):
Example of a Recurrent Neural Network
(RNN) using Python and TensorFlow/Keras.
This example shows how to use an RNN for
sequence prediction, specifically predicting
the next number in a sequence.
Goal:
Given a sequence of numbers, the RNN learns
to predict the next number.
For example, given [1, 2, 3], predict 4.
Step-by-Step RNN Example in Python
(with TensorFlow/Keras)
python
import numpy as np
from tensorflow.keras.models import
Sequential
from tensorflow.keras.layers import
SimpleRNN, Dense
# 1. Prepare the data
# Input sequences (X): [1, 2, 3], [2, 3, 4], ...,
[7, 8, 9]
# Target values (y): 4, 5, ..., 10
X = []
y = []
for i in range(1, 8):
X.append([i, i+1, i+2])
y.append(i + 3)
# Convert to NumPy arrays
X = np.array(X)
y = np.array(y)
# Reshape X to [samples, time steps, features]
X = X.reshape((X.shape[0], X.shape[1], 1))
# 2. Define the RNN model
model = Sequential([
SimpleRNN(10, activation='relu',
input_shape=(3, 1)),
Dense(1)
])
# 3. Compile and train
model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)
# 4. Predict
test_input = np.array([[8, 9, 10]]).reshape((1,
3, 1))
predicted = model.predict(test_input,
verbose=0)
print(f"Predicted next number after [8, 9, 10]:
{predicted[0][0]:.2f}")
Explanation:
Input shape: (samples, time_steps,
features) — here, each sequence has 3
time steps, with 1 feature per step.
SimpleRNN: A basic RNN layer that
processes sequences.
Dense: Fully connected output layer
producing a single number.