0% found this document useful (0 votes)

19 views44 pages

Module 4

Advance machine learning techniques

Uploaded by

Kusuma Learning love

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views44 pages

Module 4

Advance machine learning techniques

Uploaded by

Kusuma Learning love

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Module-4

Advanced Machine Learning Techniques:

Ensemble Methods:

Ensemble Learning is a machine learning paradigm

where multiple models (often called "weak
learners") are trained and combined to solve the
same problem to achieve better performance than
any single model alone.
Here are the main ensemble learning techniques:

✅ 1. Bagging (Bootstrap Aggregating)

 Goal: Reduce variance (overfitting).

 How: Train multiple models on different

random subsets (with replacement) of the

training data, and average (or vote) their
predictions.
 Common Algorithm:

o Random Forest (ensemble of decision trees)

 Use case: High-variance models like decision

trees.

✅ 2. Boosting
 Goal: Reduce bias (underfitting).

 How: Train models sequentially; each model

learns from the mistakes of the previous one by

giving more weight to misclassified instances.
 Popular Algorithms:

o AdaBoost (Adaptive Boosting)

o Gradient Boosting Machines (GBM)
o XGBoost, LightGBM, CatBoost (optimized

implementations)
 Use case: Structured/tabular data with
moderate to large size.

✅ 3. Stacking (Stacked Generalization)

 Goal: Combine multiple diverse models using a

meta-model.
 How: Train different models on the training set,

then use their predictions as input features to

train a higher-level model (meta-learner).
 Example:

o Level-0 models: SVM, Logistic Regression,

Random Forest
o Level-1 model (meta-model): Logistic

Regression

✅ 4. Voting
 Goal: Simple combination of predictions from

different models.
 Types:

o Hard Voting: Majority class wins.

o Soft Voting: Average predicted probabilities

(requires probability outputs).

 Use case: When you have a mix of models and

want a quick ensemble.

✅ 5. Blending
 Similar to stacking but uses a hold-out

validation set instead of cross-validation to train

the meta-model.
 Faster but may lead to overfitting if the

validation set is too small.

Gradient Boosting Machines (GBM), Extreme

Gradient Boosting (XGBoost).

Ensemble Methods: Bagging and Boosting:

Ensemble methods are machine learning techniques

that combine predictions from multiple models to
improve performance, reduce variance, and prevent
overfitting. Two of the most popular ensemble
methods are Bagging and Boosting.

1. Bagging (Bootstrap Aggregating)

Handles variance or over fitting.
Key Idea: Train multiple models independently on
different random subsets of the data and aggregate
their predictions (e.g., by voting or averaging).
 Process:
1. Generate multiple bootstrap samples
(random samples with replacement) from
the training data.
2. Train a separate model (usually the same
type, like decision trees) on each sample.
3. Combine predictions:
 Classification: majority vote

 Regression: average

 Goal: Reduce variance and improve model

stability.
 Example Algorithms:
o Random Forest (Bagging applied to decision
trees, with additional randomness in feature
selection)
 Advantages:
o Handles overfitting well (especially for high-
variance models).
o Easy to parallelize.
 Disadvantages:
o Doesn’t reduce bias (Under fitting)
significantly.
o Large ensembles can be slow in prediction
time.

2. Boosting
It is used to handle bias or under fitting.
Key Idea: Train models sequentially, where each
new model focuses on correcting the errors of the
previous ones.
 Process:

1. Start with an initial weak model (e.g., a

shallow decision tree).
2. Train the next model to focus more on data
points that were mispredicted by the
previous model.
3. Combine the models using a weighted
majority or sum of predictions.
 Goal: Reduce bias and build a strong predictive

model from weak learners.

 Popular Boosting Algorithms:

o AdaBoost (Adaptive Boosting)

o Gradient Boosting Machines (GBM)
o XGBoost
 Advantages:
 Often achieves higher accuracy than bagging,

especially on structured/tabular data.

 Can handle bias better.

 Disadvantages:
 More prone to overfitting if not carefully

regularized.
 Sequential process is harder to parallelize.

 More sensitive to noisy data and outliers.

Gradient Boosting Machines (GBM)

Gradient Boosting Machines (GBM) are a powerful
ensemble machine learning technique used for both
regression and classification tasks.
GBM builds models in a sequential, stage-wise
fashion by combining multiple weak learners,
typically decision trees, into a strong predictive
model.
Core Idea of GBM
GBM minimizes a loss function by adding models
(e.g., trees) that correct the errors (residuals) made
by previous models.
The algorithm uses gradient descent to optimize the
model step-by-step.

How GBM Works (Step-by-Step)

1. Initialize the model:
o Start with a simple model (e.g., mean of

targets for regression).

2. Compute residuals:
o Calculate the difference between the actual

and predicted values.

3. Fit a weak learner:
o Train a shallow decision tree (often with

depth 1–5) on the residuals.

4. Update the model:
o Add the new tree's predictions to the
existing model with a learning rate η.
o New prediction:

Fm(x)=Fm−1(x)+η⋅hm(x)
5. Repeat:
 Continue this process for a specified number of
iterations or until convergence.
Key Components
 Loss function: Determines what the model is

optimizing (e.g., MSE for regression, log-loss for

classification).
 Weak learner: Usually decision trees.

 Learning rate η: Controls the contribution of

each tree.
 Number of trees: More trees can improve
accuracy but increase training time and risk
overfitting.
 Subsampling: Often used (as in Stochastic
GBM) to improve generalization and reduce
overfitting.
Advantages
 Handles non-linear relationships well.

 Highly accurate and flexible.

 Can work with different types of loss functions.

 Supports both classification and regression.

Disadvantages
 Computationally intensive (training is slower).

 Can overfit if not properly tuned.

 Less interpretable than linear models.

Example of a Gradient Boosting

Machine (GBM)
Problem: Predict whether a customer will buy a
product (binary classification)
We’ll use the popular sklearn.datasets for a
synthetic dataset.
Code Example (Gradient Boosting
Classifier):
python
from sklearn.datasets import make_classification
from sklearn.ensemble import
GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 1. Generate a synthetic binary classification
dataset
X, y = make_classification(n_samples=1000,
n_features=20, n_informative=15, n_redundant=5,
random_state=42)
# 2. Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.3, random_state=42)
# 3. Initialize the Gradient Boosting Classifier
gbm = GradientBoostingClassifier(n_estimators=100, # number
of boosting stages
learning_rate=0.1, # shrinkage rate
max_depth=3, # depth of each tree
random_state=42)
# 4. Fit the model
gbm.fit(X_train, y_train)

# 5. Predict on test data

y_pred = gbm.predict(X_test)

# 6. Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Extreme Gradient Boosting (XGBoost):

Extreme Gradient Boosting (XGBoost) is a powerful
and efficient implementation of the gradient
boosting framework.
It's widely used in machine learning competitions
(like those on Kaggle) and industry applications due
to its high performance, scalability, and ability to
handle various data types and problem settings.

What is XGBoost?
 XGBoost stands for Extreme Gradient Boosting.

 It is a decision-tree-based ensemble Machine

Learning algorithm that uses a gradient boosting

framework.
 Developed by Tianqi Chen, it aims to be highly

efficient, flexible, and portable.

How It Works
XGBoost builds an ensemble (collection) of decision
trees, where each new tree attempts to correct the errors
of the previous ones.
1. Initial Prediction: Start with an initial model (like
predicting the mean).
2. Compute Residuals: Find the difference between
predicted and actual values.
3. Train a Tree: Train a tree to predict the residuals
(errors).
4. Update Model: Add the predictions from the new
tree to improve the model.
5. Repeat: Continue adding trees until stopping
criteria are met.
This is similar to traditional gradient boosting but with
significant performance improvements.
Key Features of XGBoost
Feature Description
Helps prevent overfitting using
Regularization L1 (Lasso) and L2 (Ridge)
penalties.
Speeds up training using multi-
Parallel Processing
core processing.
Uses max depth rather than max
Tree Pruning
number of nodes (pre-pruning).
Handling Missing Automatically learns the best way
Data to handle missing values.
Custom Loss Allows user-defined objective
Functions functions.
Optimized for sparse (missing)
Sparsity Awareness
input data.
Built-in support for k-fold cross-
Cross-validation
validation.
Feature Description
Parameter Description
n_estimators Number of boosting rounds (trees).
max_depth Maximum depth of a tree.
learning_rate Step size shrinkage (also called eta).
subsample Fraction of data used per tree.
colsample_bytree Fraction of features used per tree.
gamma Minimum loss reduction required to make a split.
lambda, alpha L2 and L1 regularization on weights.

How to Use XGBoost in Python

import xgboost as xgb

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.2)

# Train model
model =
xgb.XGBRegressor(objective='reg:squarederror',
n_estimators=100)
model.fit(X_train, y_train)

# Predict and evaluate

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Introduction to Support Vector Machines (SVM)
Support Vector Machine (SVM) is a supervised
machine learning algorithm used for classification,
regression, and even outlier detection. It is particularly
well-known for its effectiveness in binary
classification problems.

Key Concepts of SVM:

1. Hyperplane
 A decision boundary that separates data points from

different classes.
 In 2D, it's a line; in 3D, a plane; in higher

dimensions, it's called a hyperplane.

 SVM aims to find the optimal hyperplane that

best separates the data.

2. Support Vectors
 The data points closest to the hyperplane.

 These points are critical because they directly

influence the position and orientation of the

hyperplane.
 Removing them would change the decision

boundary.
3. Margin
 The distance between the hyperplane and the

nearest data points (support vectors) on either side.

 SVM seeks to maximize this margin for better

generalization.
4. Linearly Separable vs Non-linearly Separable
Data
 For linearly separable data, SVM can find a

straight hyperplane.
 For non-linearly separable data, SVM uses a

kernel trick to project data into higher dimensions

where a linear separator can exist.

The Kernel Trick:

Kernels allow SVM to operate in a higher-dimensional

space without explicitly computing the transformation.
Common kernels include:

Kernel Type Function

K(x, x') = x⊤x
Linear
′
Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)

How SVM Works :

1. Input labeled training data (e.g., cats vs dogs).
2. Choose a kernel function (linear or non-linear).
3. Find the optimal hyperplane that maximizes the
margin between classes.
4. Classify new data based on which side of the
hyperplane they fall on.

Linear Support Vector Machine (Linear SVM):

A Linear Support Vector Machine (Linear SVM) is a

supervised machine learning algorithm used primarily
for binary classification tasks. It tries to find the best
separating hyperplane (a straight line in 2D, a plane in
3D, etc.) that maximizes the margin between two
classes.
Key Concepts
1. Hyperplane:
o A decision boundary that separates classes.

o In 2D: a line; in 3D: a plane; in higher

dimensions: a hyperplane.
2. Margin:
o The distance between the hyperplane and the

closest data points from each class.

o A larger margin generally indicates a better

generalization to unseen data.

3. Support Vectors:
o The data points that are closest to the

hyperplane.
o They "support" or define the position and

orientation of the hyperplane.

4. Linear Kernel:
o Linear SVM assumes that data is linearly
separable, meaning a straight line (or
hyperplane) can separate the classes.

Objective Function
The optimization problem is:
Minimize:
1/2∥w∥2
Subject to:
yi(w⋅xi+b)≥1for all i
Where:
 w = weight vector (normal to the

hyperplane)
 b = bias term

 xi = feature vector for sample i

 yi = label (+1 or -1)

Common Uses
 Text classification (e.g., spam detection)

 Image classification

 Bioinformatics (e.g., cancer detection)

Non-Linier Support Vector Machine:

A non-linear Support Vector Machine
(SVM) is an extension of the basic (linear)
SVM that can handle data that is not linearly
separable by transforming it into a higher-
dimensional space where a linear separation is
possible.
An SVM is a supervised machine learning
algorithm used for classification and
regression. It works by finding the optimal
hyperplane that separates data points of
different classes with the maximum margin.

Why Non-Linear?
In many real-world problems, the data is not
linearly separable. A straight line (in 2D) or
hyperplane (in higher dimensions) cannot
divide the classes perfectly.
✅ Example:
Imagine trying to classify this:
Class A (circles): ⭕
Class B (crosses): ❌
markdown

⭕ ❌
⭕
❌ ⭕
❌
⭕ ❌
No straight line can separate them well — so
we need to go non-linear.
How It Works: The Kernel Trick
Instead of explicitly computing the
coordinates in higher dimensions, SVM uses a
kernel function to compute the dot product
in that space.
This allows the algorithm to fit the maximum-
margin hyperplane in a transformed feature
space.
Kernel Type Function
K(x, x') = x⊤x
Linear
′
Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)
Real-World Applications
 Image classification

 Handwriting recognition (e.g., MNIST)

 Bioinformatics (e.g., gene classification)

 Text classification (e.g., spam detection)

Neural Networks and Deep Learning:

Introduction to Neural Networks:
Neural networks are a class of algorithms
inspired by the structure and functioning of the
human brain.
They form the foundation of many modern
artificial intelligence (AI) systems,
particularly in areas such as image
recognition, natural language processing,
and game playing.

What Is a Neural Network?

A neural network is a computational model
made up of layers of interconnected nodes (or
"neurons") that can learn to recognize patterns
from data
Biological Analogy:
 Neurons: Like brain cells, each node

processes input and passes on its output.

 Synapses: Connections between nodes,

each with a weight representing its

strength.
 Learning: Adjusting the weights to

improve predictions (akin to learning from

experience).
Components of a Neural Network
1. Input Layer
o Receives the input features (e.g., pixel

values of an image, text data).

2. Hidden Layers
o Perform computations and

transformations.
o Each node applies an activation

function to its inputs (like ReLU,

sigmoid).
3. Output Layer
o Produces the final prediction (e.g.,

class label, value).

How Does It Work?
The neural network processes input data in the
following steps:
1. Forward Propagation
o Data moves from input → hidden

layers → output.
o Each neuron computes a weighted sum

of inputs and passes it through an

activation function.
2. Loss Calculation
o The output is compared with the true

value using a loss function (e.g., Mean

Squared Error, Cross-Entropy).
3. Backpropagation
o The network calculates the gradient of
the loss with respect to each weight
using calculus (chain rule).
o Uses this to update the weights
(usually via gradient descent).
4. Types of Neural Networks

Type Use Case

Feedforward Neural Basic tasks (e.g., regression,
Network classification)
Convolutional Neural
Image and video recognition
Network (CNN)
Recurrent Neural Time-series or sequential
Network (RNN) data (e.g., language)
Modern NLP models (e.g.,
Transformer Networks
ChatGPT)
Example Use Cases
 Image Classification (e.g., recognizing cats vs

dogs)
 Speech Recognition

 Machine Translation

 Medical Diagnosis

 Stock Price Prediction

Why Are Neural Networks Powerful?

 Can model nonlinear and complex relationships

 Perform end-to-end learning from raw data

 Scalable to large datasets and tasks

Building and training neural networks
using TensorFlow and Kera’s:

TensorFlow framework:
TensorFlow is an open-source machine
learning and deep learning framework
developed by Google.
It allows developers and researchers to build
and train machine learning models,
particularly neural networks, for a wide
variety of tasks such as:
 Image recognition

 Natural language processing

 Time series prediction

 Reinforcement learning

 And many more

Key Features of TensorFlow:

 Flexible architecture: Can run on CPUs,

GPUs, and TPUs (specialized hardware).

 High-level APIs: Such as Keras, which

makes it easier to build and train models.

 Scalable: Can run on everything from a
single smartphone to a large-scale
distributed system.
 Support for production: TensorFlow
Serving, TensorFlow Lite, and
TensorFlow.js make it usable on servers,
mobile devices, and the web.
Example Use Case:

python

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Simple neural network for classification

model = Sequential([
Dense(64, activation='relu', input_shape=(100,)),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy')

Keras Framework:
Keras is an open-source deep learning framework that
makes it easier to build and train neural networks.
It provides a user-friendly, high-level API designed to
work with TensorFlow, which is a more complex backend
framework for numerical computation.
Key Points about Keras:
 High-level API: Keras is designed for human beings,
not machines.
 It prioritizes ease of use, simplicity, and fast
prototyping.
 Runs on top of TensorFlow: As of 2020, Keras is

tightly integrated into TensorFlow and is officially

part of it (tf.keras).
 Modular and Extensible: You can build complex

neural network models using a modular approach —

layers, optimizers, loss functions, and metrics are all
easily configurable.
 Used for deep learning: Applications include image

classification, natural language processing, time series

forecasting, etc.
Example:
Here's a simple Keras model for binary
classification:
python
import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
layers.Dense(64, activation='relu', input_shape=(10,)),
layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

# Example data
import numpy as np
x_train = np.random.rand(1000, 10)
y_train = np.random.randint(0, 2, size=(1000,))
model.fit(x_train, y_train, epochs=10, batch_size=32)

This trains a simple neural network with:

 1 hidden layer of 64 neurons (ReLU activation)

 1 output layer (sigmoid for binary output)

TensorFlow, activation functions:

In TensorFlow, activation functions are crucial

components of neural networks that introduce non-
linearity, allowing the network to learn complex
patterns.
Here's a list of commonly used activation functions
in TensorFlow (tf.nn or tf.keras.activations), along
with brief descriptions:
Common Activation Functions
Activation Description TensorFlow Function
Sets all negative
ReLU
values to 0. Most tf.nn.relu(x) or
(Rectified
common due to tf.keras.activations.re
Linear
simplicity and lu(x)
Unit)
efficiency.
Squashes input to
tf.nn.sigmoid(x) or
range (0, 1). Often
Sigmoid tf.keras.activations.si
used in binary
gmoid(x)
classification.
Squashes input to
Tanh tf.nn.tanh(x) or
(-1, 1). Zero-
(Hyperboli tf.keras.activations.ta
centered unlike
c Tangent) nh(x)
sigmoid.
Activation Description TensorFlow Function
Converts logits
into probabilities
tf.nn.softmax(x) or
that sum to 1.
Softmax tf.keras.activations.so
Used in multi-
ftmax(x)
class classification
(last layer).
Variant of ReLU
Leaky that allows a small tf.nn.leaky_relu(x,
ReLU gradient when alpha=0.2)
inactive.
ELU Similar to ReLU,
(Exponenti but smoother and
tf.nn.elu(x)
al Linear has negative
Unit) outputs.
Self-normalizing
SELU variant of ELU.
(Scaled Used in self- tf.nn.selu(x)
ELU) normalizing
networks.
GELU
Smooth
(Gaussian
approximation of
Error tf.nn.gelu(x)
ReLU; used in
Linear
Transformers.
Unit)
x * sigmoid(x);
tf.nn.swish(x) or
often better than
Swish tf.keras.activations.s
ReLU in deeper
wish(x)
models.
🔹 How to Use Them in Models
With tf.keras.layers.Dense:
python

import tensorflow as tf

model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Or with the functional API:
python
x = tf.keras.Input(shape=(10,))
y = tf.keras.layers.Dense(32, activation=tf.nn.elu)(x)

Example:
TesorFlow.js
A JavaScript Library for
Training and Deploying
Machine Learning Models
In the Browser

Tensorflow Models
Models and Layers are important building blocks
in Machine Learning.
For different Machine Learning tasks you must
combine different types of Layers into a Model that
can be trained with data to predict future values.
TensorFlow.js is supporting different types
of Models and different types of Layers.
A TensorFlow Model is a Neural Network with one
or more Layers.

A Tensorflow Project
A Tensorflow project has this typical workflow:
 Collecting Data

 Creating a Model

 Adding Layers to the Model

 Compiling the Model

 Training the Model

 Using the Model

Example
Suppose you knew a function that defined a strait
line:

Y = 1.2X + 5

Then you could calculate any y value with the

JavaScript formula:

y = 1.2 * x + 5;

To demonstrate Tensorflow.js, we could train a

Tensorflow.js model to predict Y values based on X
inputs.

Note
The TensorFlow model does not know the function.

const xs = tf.tensor([0, 1, 2, 3, 4]);

const ys = xs.mul(1.2).add(5);

// Define a Linear Regression Model

const model = tf.sequential();
model.add(tf.layers.dense({units:1,
inputShape:[1]}));

// Specify Loss and Optimizer

model.compile({loss:'meanSquaredError',
optimizer:'sgd'});

// Train the Model

model.fit(xs, ys, {epochs:500}).then(() =>
{myFunction()});

// Use the Model

What is Keras?
Keras is a deep learning API that simplifies the
process of building deep neural networks.
Initially it was developed as an independent library,
Keras is now tightly integrated into TensorFlow as
its official high-level API.
It supports multiple backend engines like
TensorFlow, Theano and Microsoft Cognitive Toolkit
(CNTK).
Keras makes it easier to train and evaluate deep
learning models without requiring extensive
knowledge of low-level operations.

How to Install Keras?

1. Since Keras is now part of TensorFlow, it can be
installed easily using pip:
pip install tensorflow
This command installs TensorFlow 2.x, which
includes Keras.
2. To check the installation, open Python and run:
import tensorflow as tf
print('TensorFlow Version: ', tf.__version__)
print('Keras Version ', tf.keras.__version__)
Output:
TensorFlow Version: 2.18.0
Keras Version 3.8.0

How to Build a Model in Keras?

Keras provides two main ways to build models:
1. Sequential API
2. Functional API

The Sequential API are easy to work with models

with a single input and output and a linear stack of
layers.
Whereas, the Functional API can be used for models
that require multiple inputs and outputs or layers
have multiple inputs or outputs.
Building Model using Sequential API
Here’s how you can define a Sequential model:
 We create a Sequential model.

 Add a fully connected (Dense) layer with 64

units and ReLU activation.

 Add another Dense layer with 10 units (for

classification) and a Softmax activation.

from keras.models import Sequential

from keras.layers import Dense, Activation

model = Sequential()
model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

Building Model using Functional API

Functional API allows more flexibility in creating
complex architectures. You can create models with
shared layers, multiple inputs/outputs and skip
connections.
For example:
 We define two input layers (input1 and input2).

 Create separate hidden layers for each input.

 Merge the hidden layers using the concatenate

function.
 Finally, add an output layer with SoftMax

activation.
from keras.layers import Input, Dense, concatenate
from keras.models import Model

input1 = Input(shape=(100,))
input2 = Input(shape=(50,))
hidden1 = Dense(64, activation='relu')(input1)
hidden2 = Dense(32, activation='relu')(input2)
merged = concatenate([hidden1, hidden2])
output = Dense(10, activation='softmax')(merged)

model = Model(inputs=[input1, input2],

outputs=output)

Example of keras API:

Here's a simple example of using the Keras API (a
high-level API of TensorFlow) to build, compile, and
train a neural network for classifying digits from the
MNIST dataset (handwritten digits 0–9). This is a
classic example in machine learning.

Keras API Example: MNIST Digit Classifier

python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) =
mnist.load_data()

# Normalize the input data (scale pixels from 0–255

to 0–1)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Flatten the images from 28x28 to 784 (for fully

connected network)
x_train = x_train.reshape((x_train.shape[0], 28 *
28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))

# One-hot encode the labels (e.g., 3 becomes

[0,0,0,1,0,0,0,0,0,0])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build the neural network model using Keras

Sequential API
model = models.Sequential([
layers.Dense(512, activation='relu',
input_shape=(784,)),

# Hidden layer with 512 neurons

layers.Dense(10, activation='softmax')
# Output layer with 10 neurons (for 10 classes)
])
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model

model.fit(x_train, y_train, epochs=5,
batch_size=128, validation_split=0.1)

# Evaluate the model on test data

test_loss, test_accuracy = model.evaluate(x_test,
y_test)
print(f'Test accuracy: {test_accuracy:.4f}')

# 7. Make predictions
predictions = model.predict(x_test)

# 8. Show some predictions

import numpy as np
for i in range(5):
plt.imshow(x_test[i], cmap='gray')
plt.title(f"Predicted: {np.argmax(predictions[i])},
Actual: {y_test[i]}")
plt.show()

🧠 Key Concepts:
 Sequential API: Used to build models layer-by-

layer.
 Dense layer: Fully connected neural network
layer.
 Activation functions: ReLU for hidden layers,
softmax for multi-class output.
 Loss function: categorical_crossentropy for
multi-class classification.
 Optimizer: adam, a widely used gradient
descent optimizer.

The MNIST dataset (Modified National Institute of Standards and

Technology database) is a benchmark dataset in machine learning
and computer vision, especially used for image classification
tasks.

📚 Overview of the MNIST Dataset

Feature Description
Type Handwritten digit images (0 to 9)
Number of classes 10 (digits 0 through 9)
Image size 28 x 28 pixels (grayscale)
Number of
60,000
training samples
Number of test
10,000
samples
Each image is a 28×28 pixel grayscale
Data format
image (784 features if flattened)
Label format A single digit (0–9)
Example Images from MNIST

Each image looks like a small, pixelated version of a handwritten

digit. Here's a visual example (you'd need to plot it in Python to
see):
python
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist

(x_train, y_train), _ = mnist.load_data()

plt.imshow(x_train[0], cmap='gray')
plt.title(f"Label: {y_train[0]}")
plt.show()

Convolutional Neural Networks (CNNs):

Convolutional Neural Networks (CNNs) are a class
of deep neural networks that are particularly
effective for processing data with a grid-like
topology, such as images.
They're most commonly used in computer vision
tasks like image classification, object detection,
segmentation, and more.

Key Concepts of CNNs

1. Convolutional Layer
 The core building block of a CNN.
 Applies filters (kernels) that slide over the input
data to detect features like edges, textures, or
shapes.
 Each filter produces a feature map (activation

map) that highlights where that feature occurs.

2. ReLU (Rectified Linear Unit)
 Applies a non-linear activation function to

introduce non-linearity.
 Commonly: ReLU(x) = max(0, x)

3. Pooling Layer
 Down samples the feature maps to reduce

dimensionality and computation.

 Common types:

o Max Pooling: takes the maximum value in

each window.
o Average Pooling: takes the average.

4. Fully Connected (Dense) Layer

 After several convolution and pooling layers, the

output is flattened and passed through one or

more dense layers for classification or regression
tasks.
5. Softmax / Sigmoid (Output Layer)
 Used in the final layer depending on the task:

o Softmax for multi-class classification.

o Sigmoid for binary classification.

Typical CNN Architecture

Input Image → [Conv → ReLU → Pool]*N →
Flatten → Dense → Output
Example: Image Classification
For a handwritten digit recognition task (like
MNIST), a CNN might:
 Detect edges in the first layer.

 Identify digit parts (like curves or lines) in

deeper layers.
 Combine these to classify the final digit.

Libraries to Use CNNs

 TensorFlow / Keras

 PyTorch

 OpenCV (for pre-processing and some model

loading)
CNN Example in Python
(TensorFlow/Keras):
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# Load and preprocess the MNIST dataset

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) =
mnist.load_data()

# Normalize pixel values to 0-1 and reshape for CNN

input
x_train = x_train.reshape(-1, 28, 28,
1).astype("float32") / 255
x_test = x_test.reshape(-1, 28, 28,
1).astype("float32") / 255

# Build the CNN model

model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax') # 10
classes for digits 0-9
])

# Compile the model

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Train the model

history = model.fit(x_train, y_train, epochs=5,
validation_split=0.1)

# Evaluate the model

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
What This Does:
Conv2D: Applies filters to detect features.

MaxPooling2D: Reduces spatial dimensions.

Flatten: Converts 2D data to 1D for the dense layer.

Dense: Fully connected layers for classification.

Recurrent Neural Networks (RNN).

What is an RNN?
A Recurrent Neural Network (RNN) is a
type of neural network designed to work with
sequential data, such as:
 Text

 Time series (like stock prices)

 Audio

 Video frames

RNNs remember past information using

loops in the network, which makes them good
at modeling sequences and time dependencies.

How Do RNNs Work?

Unlike traditional neural networks (like
feedforward networks), RNNs have memory
of previous inputs.
For a sequence of inputs:
x₁, x₂, x₃, ..., xₜ

An RNN processes each input step by step:

At time t:
Input: xₜ
Hidden state: hₜ = f(Wxₜ + Uhₜ₋₁ + b)
Output: yₜ = g(hₜ)
 hₜ: Hidden state (memory of the

network)
 Wxₜ: Input weight

 Uhₜ₋₁: Previous hidden state

 f: Activation function (e.g., tanh or

ReLU)
 g: Output function

This loop allows RNNs to carry information

from earlier in the sequence to later steps.

Applications of RNNs
 Language modeling and generation
(e.g., GPT-style models started with RNN
ideas)
 Machine translation

 Speech recognition

 Stock price prediction

 Music generation

Example of a Recurrent Neural Network

(RNN):
Example of a Recurrent Neural Network
(RNN) using Python and TensorFlow/Keras.
This example shows how to use an RNN for
sequence prediction, specifically predicting
the next number in a sequence.

Goal:
Given a sequence of numbers, the RNN learns
to predict the next number.
For example, given [1, 2, 3], predict 4.
Step-by-Step RNN Example in Python
(with TensorFlow/Keras)
python
import numpy as np
from tensorflow.keras.models import
Sequential
from tensorflow.keras.layers import
SimpleRNN, Dense

# 1. Prepare the data

# Input sequences (X): [1, 2, 3], [2, 3, 4], ...,
[7, 8, 9]
# Target values (y): 4, 5, ..., 10

X = []
y = []
for i in range(1, 8):
X.append([i, i+1, i+2])
y.append(i + 3)

# Convert to NumPy arrays

X = np.array(X)
y = np.array(y)

# Reshape X to [samples, time steps, features]

X = X.reshape((X.shape[0], X.shape[1], 1))

# 2. Define the RNN model

model = Sequential([
SimpleRNN(10, activation='relu',
input_shape=(3, 1)),
Dense(1)
])

# 3. Compile and train

model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

# 4. Predict
test_input = np.array([[8, 9, 10]]).reshape((1,
3, 1))
predicted = model.predict(test_input,
verbose=0)

print(f"Predicted next number after [8, 9, 10]:

{predicted[0][0]:.2f}")
Explanation:
 Input shape: (samples, time_steps,

features) — here, each sequence has 3

time steps, with 1 feature per step.
 SimpleRNN: A basic RNN layer that

processes sequences.
 Dense: Fully connected output layer

producing a single number.

UNIT1
No ratings yet
UNIT1
80 pages
Machine Learning
No ratings yet
Machine Learning
93 pages
Edpm Sba
100% (2)
Edpm Sba
16 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
MODULE - 4 - PART 1 - Ensemble - Methods
No ratings yet
MODULE - 4 - PART 1 - Ensemble - Methods
24 pages
Module 4 ML
No ratings yet
Module 4 ML
33 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Week 11
No ratings yet
Week 11
16 pages
Introduction To AIML
No ratings yet
Introduction To AIML
3 pages
Ensemble Methods Final PDF
No ratings yet
Ensemble Methods Final PDF
25 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Lecture12 Annotated
No ratings yet
Lecture12 Annotated
20 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
ML Exp4 Part A
No ratings yet
ML Exp4 Part A
14 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
Boosting
No ratings yet
Boosting
2 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
AI's Impact on Labor-Intensive Jobs
No ratings yet
AI's Impact on Labor-Intensive Jobs
46 pages
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
No ratings yet
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
6 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
Xgboost: Notebook
No ratings yet
Xgboost: Notebook
8 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Ensemble, Voting, Bagging, Boosting
No ratings yet
Ensemble, Voting, Bagging, Boosting
15 pages
XG Boost
No ratings yet
XG Boost
5 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ai Question Paper PDF
No ratings yet
Ai Question Paper PDF
3 pages
Ensemble Methods
No ratings yet
Ensemble Methods
21 pages
Module 10 - Part 2 - Boosting Models
No ratings yet
Module 10 - Part 2 - Boosting Models
14 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
No ratings yet
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
6 pages
XGBoost
No ratings yet
XGBoost
4 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
XGBoost - Unleashing The Power of Gradient Boosting
No ratings yet
XGBoost - Unleashing The Power of Gradient Boosting
10 pages
Al3451 Ia 2 Answer Key
No ratings yet
Al3451 Ia 2 Answer Key
12 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Module 3.5 Ensemble Learning XGBoost
No ratings yet
Module 3.5 Ensemble Learning XGBoost
26 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Machine Learning Boosting Guide
No ratings yet
Machine Learning Boosting Guide
27 pages
XGBoost & Adaboost
No ratings yet
XGBoost & Adaboost
22 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble Methods
No ratings yet
Ensemble Methods
3 pages
Object Detection and Recognition System (Using TensorFlow)
No ratings yet
Object Detection and Recognition System (Using TensorFlow)
8 pages
XGBoost - A Powerful Machine Learning Algorithm For Beginners
No ratings yet
XGBoost - A Powerful Machine Learning Algorithm For Beginners
3 pages
Plagiarism
No ratings yet
Plagiarism
20 pages
Breast Cancer Tumor Prediction Using XGBOOST
No ratings yet
Breast Cancer Tumor Prediction Using XGBOOST
1 page
A Bibliometric View of AI Ethics Development
No ratings yet
A Bibliometric View of AI Ethics Development
5 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
Twitter Sentiment Analysis The Power of Semantics
No ratings yet
Twitter Sentiment Analysis The Power of Semantics
10 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Essay On Artificially Intelligent Machines
No ratings yet
Essay On Artificially Intelligent Machines
3 pages
Unit 2 - Class - Preceptron
No ratings yet
Unit 2 - Class - Preceptron
13 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Datagiri: Presented 17 November By: Himanshu Shrivastava
No ratings yet
Datagiri: Presented 17 November By: Himanshu Shrivastava
17 pages
Deep Learning Course CS671 IIT Mandi
No ratings yet
Deep Learning Course CS671 IIT Mandi
2 pages
Karthik Resume
No ratings yet
Karthik Resume
4 pages
UAE National Program For AI - AIGuide
No ratings yet
UAE National Program For AI - AIGuide
39 pages
Explainable AI in Ethical Decisions
No ratings yet
Explainable AI in Ethical Decisions
24 pages
Artificial Intelligence Research Paper Topics
No ratings yet
Artificial Intelligence Research Paper Topics
6 pages
The Algorithmic Age - How Mathematics Is Powering The AI Revolution
No ratings yet
The Algorithmic Age - How Mathematics Is Powering The AI Revolution
9 pages
Pros and Cons of AI: Benefits & Risks
No ratings yet
Pros and Cons of AI: Benefits & Risks
6 pages
College Enquiry Chatbot-Collegiate Chatter: Abstract
No ratings yet
College Enquiry Chatbot-Collegiate Chatter: Abstract
6 pages
M. B. E. Society's College of Engineering, Ambajogai: Lonere
No ratings yet
M. B. E. Society's College of Engineering, Ambajogai: Lonere
15 pages
1 Aiprogramme2024
No ratings yet
1 Aiprogramme2024
3 pages
Lesson Plan Aiml NLP
No ratings yet
Lesson Plan Aiml NLP
2 pages
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
No ratings yet
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
15 pages
PRCV Viva Notes
No ratings yet
PRCV Viva Notes
32 pages
An Analysis of China's AI Governance Proposals - Center For Security and Emerging Technology
No ratings yet
An Analysis of China's AI Governance Proposals - Center For Security and Emerging Technology
16 pages
UNIT II - Excite - CHAPTER 1 2 - Introduction of AI Types and Techniques-Converted1626065317
No ratings yet
UNIT II - Excite - CHAPTER 1 2 - Introduction of AI Types and Techniques-Converted1626065317
21 pages
The Impact of AI For Bachelor Graduates in The Future Argumentative Essay Assignment
No ratings yet
The Impact of AI For Bachelor Graduates in The Future Argumentative Essay Assignment
2 pages
Department of Mechanical Engineering Mentofmechanicalengineering Ofmechanicalengineering
No ratings yet
Department of Mechanical Engineering Mentofmechanicalengineering Ofmechanicalengineering
2 pages
CCS349-IVA Assignment Questions - 1 - Answer Key
No ratings yet
CCS349-IVA Assignment Questions - 1 - Answer Key
6 pages
Deepfake Technology and Current Legal Status of It: Min Liu, Xijin Zhang
No ratings yet
Deepfake Technology and Current Legal Status of It: Min Liu, Xijin Zhang
7 pages
Tap
No ratings yet
Tap
6 pages
Question Bank 2021-22
No ratings yet
Question Bank 2021-22
5 pages
Research Paper On Artificial Intelligence and Its Impact On The Job Market
No ratings yet
Research Paper On Artificial Intelligence and Its Impact On The Job Market
8 pages