Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views44 pages

Module 4

Advance machine learning techniques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views44 pages

Module 4

Advance machine learning techniques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Module-4

Advanced Machine Learning Techniques:


Ensemble Methods:

Ensemble Learning is a machine learning paradigm


where multiple models (often called "weak
learners") are trained and combined to solve the
same problem to achieve better performance than
any single model alone.
Here are the main ensemble learning techniques:

✅ 1. Bagging (Bootstrap Aggregating)


 Goal: Reduce variance (overfitting).

 How: Train multiple models on different

random subsets (with replacement) of the


training data, and average (or vote) their
predictions.
 Common Algorithm:

o Random Forest (ensemble of decision trees)

 Use case: High-variance models like decision

trees.

✅ 2. Boosting
 Goal: Reduce bias (underfitting).

 How: Train models sequentially; each model

learns from the mistakes of the previous one by


giving more weight to misclassified instances.
 Popular Algorithms:

o AdaBoost (Adaptive Boosting)


o Gradient Boosting Machines (GBM)
o XGBoost, LightGBM, CatBoost (optimized

implementations)
 Use case: Structured/tabular data with
moderate to large size.

✅ 3. Stacking (Stacked Generalization)


 Goal: Combine multiple diverse models using a

meta-model.
 How: Train different models on the training set,

then use their predictions as input features to


train a higher-level model (meta-learner).
 Example:

o Level-0 models: SVM, Logistic Regression,

Random Forest
o Level-1 model (meta-model): Logistic

Regression

✅ 4. Voting
 Goal: Simple combination of predictions from

different models.
 Types:

o Hard Voting: Majority class wins.

o Soft Voting: Average predicted probabilities

(requires probability outputs).


 Use case: When you have a mix of models and

want a quick ensemble.


✅ 5. Blending
 Similar to stacking but uses a hold-out

validation set instead of cross-validation to train


the meta-model.
 Faster but may lead to overfitting if the

validation set is too small.

Gradient Boosting Machines (GBM), Extreme


Gradient Boosting (XGBoost).

Ensemble Methods: Bagging and Boosting:

Ensemble methods are machine learning techniques


that combine predictions from multiple models to
improve performance, reduce variance, and prevent
overfitting. Two of the most popular ensemble
methods are Bagging and Boosting.

1. Bagging (Bootstrap Aggregating)


Handles variance or over fitting.
Key Idea: Train multiple models independently on
different random subsets of the data and aggregate
their predictions (e.g., by voting or averaging).
 Process:
1. Generate multiple bootstrap samples
(random samples with replacement) from
the training data.
2. Train a separate model (usually the same
type, like decision trees) on each sample.
3. Combine predictions:
 Classification: majority vote

 Regression: average

 Goal: Reduce variance and improve model


stability.
 Example Algorithms:
o Random Forest (Bagging applied to decision
trees, with additional randomness in feature
selection)
 Advantages:
o Handles overfitting well (especially for high-
variance models).
o Easy to parallelize.
 Disadvantages:
o Doesn’t reduce bias (Under fitting)
significantly.
o Large ensembles can be slow in prediction
time.

2. Boosting
It is used to handle bias or under fitting.
Key Idea: Train models sequentially, where each
new model focuses on correcting the errors of the
previous ones.
 Process:

1. Start with an initial weak model (e.g., a


shallow decision tree).
2. Train the next model to focus more on data
points that were mispredicted by the
previous model.
3. Combine the models using a weighted
majority or sum of predictions.
 Goal: Reduce bias and build a strong predictive

model from weak learners.


 Popular Boosting Algorithms:

o AdaBoost (Adaptive Boosting)


o Gradient Boosting Machines (GBM)
o XGBoost
 Advantages:
 Often achieves higher accuracy than bagging,

especially on structured/tabular data.


 Can handle bias better.

 Disadvantages:
 More prone to overfitting if not carefully

regularized.
 Sequential process is harder to parallelize.

 More sensitive to noisy data and outliers.

Gradient Boosting Machines (GBM)


Gradient Boosting Machines (GBM) are a powerful
ensemble machine learning technique used for both
regression and classification tasks.
GBM builds models in a sequential, stage-wise
fashion by combining multiple weak learners,
typically decision trees, into a strong predictive
model.
Core Idea of GBM
GBM minimizes a loss function by adding models
(e.g., trees) that correct the errors (residuals) made
by previous models.
The algorithm uses gradient descent to optimize the
model step-by-step.

How GBM Works (Step-by-Step)


1. Initialize the model:
o Start with a simple model (e.g., mean of

targets for regression).


2. Compute residuals:
o Calculate the difference between the actual

and predicted values.


3. Fit a weak learner:
o Train a shallow decision tree (often with

depth 1–5) on the residuals.


4. Update the model:
o Add the new tree's predictions to the
existing model with a learning rate η.
o New prediction:

Fm(x)=Fm−1(x)+η⋅hm(x)
5. Repeat:
 Continue this process for a specified number of
iterations or until convergence.
Key Components
 Loss function: Determines what the model is

optimizing (e.g., MSE for regression, log-loss for


classification).
 Weak learner: Usually decision trees.

 Learning rate η: Controls the contribution of

each tree.
 Number of trees: More trees can improve
accuracy but increase training time and risk
overfitting.
 Subsampling: Often used (as in Stochastic
GBM) to improve generalization and reduce
overfitting.
Advantages
 Handles non-linear relationships well.

 Highly accurate and flexible.

 Can work with different types of loss functions.

 Supports both classification and regression.

Disadvantages
 Computationally intensive (training is slower).

 Can overfit if not properly tuned.

 Less interpretable than linear models.

Example of a Gradient Boosting


Machine (GBM)
Problem: Predict whether a customer will buy a
product (binary classification)
We’ll use the popular sklearn.datasets for a
synthetic dataset.
Code Example (Gradient Boosting
Classifier):
python
from sklearn.datasets import make_classification
from sklearn.ensemble import
GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 1. Generate a synthetic binary classification
dataset
X, y = make_classification(n_samples=1000,
n_features=20, n_informative=15, n_redundant=5,
random_state=42)
# 2. Split into training and testing data
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.3, random_state=42)
# 3. Initialize the Gradient Boosting Classifier
gbm = GradientBoostingClassifier(n_estimators=100, # number
of boosting stages
learning_rate=0.1, # shrinkage rate
max_depth=3, # depth of each tree
random_state=42)
# 4. Fit the model
gbm.fit(X_train, y_train)

# 5. Predict on test data


y_pred = gbm.predict(X_test)

# 6. Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

Extreme Gradient Boosting (XGBoost):


Extreme Gradient Boosting (XGBoost) is a powerful
and efficient implementation of the gradient
boosting framework.
It's widely used in machine learning competitions
(like those on Kaggle) and industry applications due
to its high performance, scalability, and ability to
handle various data types and problem settings.

What is XGBoost?
 XGBoost stands for Extreme Gradient Boosting.

 It is a decision-tree-based ensemble Machine

Learning algorithm that uses a gradient boosting


framework.
 Developed by Tianqi Chen, it aims to be highly

efficient, flexible, and portable.


How It Works
XGBoost builds an ensemble (collection) of decision
trees, where each new tree attempts to correct the errors
of the previous ones.
1. Initial Prediction: Start with an initial model (like
predicting the mean).
2. Compute Residuals: Find the difference between
predicted and actual values.
3. Train a Tree: Train a tree to predict the residuals
(errors).
4. Update Model: Add the predictions from the new
tree to improve the model.
5. Repeat: Continue adding trees until stopping
criteria are met.
This is similar to traditional gradient boosting but with
significant performance improvements.
Key Features of XGBoost
Feature Description
Helps prevent overfitting using
Regularization L1 (Lasso) and L2 (Ridge)
penalties.
Speeds up training using multi-
Parallel Processing
core processing.
Uses max depth rather than max
Tree Pruning
number of nodes (pre-pruning).
Handling Missing Automatically learns the best way
Data to handle missing values.
Custom Loss Allows user-defined objective
Functions functions.
Optimized for sparse (missing)
Sparsity Awareness
input data.
Built-in support for k-fold cross-
Cross-validation
validation.
Feature Description
Parameter Description
n_estimators Number of boosting rounds (trees).
max_depth Maximum depth of a tree.
learning_rate Step size shrinkage (also called eta).
subsample Fraction of data used per tree.
colsample_bytree Fraction of features used per tree.
gamma Minimum loss reduction required to make a split.
lambda, alpha L2 and L1 regularization on weights.

How to Use XGBoost in Python

import xgboost as xgb


from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load data
X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,
y, test_size=0.2)

# Train model
model =
xgb.XGBRegressor(objective='reg:squarederror',
n_estimators=100)
model.fit(X_train, y_train)

# Predict and evaluate


y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Introduction to Support Vector Machines (SVM)
Support Vector Machine (SVM) is a supervised
machine learning algorithm used for classification,
regression, and even outlier detection. It is particularly
well-known for its effectiveness in binary
classification problems.

Key Concepts of SVM:

1. Hyperplane
 A decision boundary that separates data points from

different classes.
 In 2D, it's a line; in 3D, a plane; in higher

dimensions, it's called a hyperplane.


 SVM aims to find the optimal hyperplane that

best separates the data.


2. Support Vectors
 The data points closest to the hyperplane.

 These points are critical because they directly

influence the position and orientation of the


hyperplane.
 Removing them would change the decision

boundary.
3. Margin
 The distance between the hyperplane and the

nearest data points (support vectors) on either side.


 SVM seeks to maximize this margin for better

generalization.
4. Linearly Separable vs Non-linearly Separable
Data
 For linearly separable data, SVM can find a

straight hyperplane.
 For non-linearly separable data, SVM uses a

kernel trick to project data into higher dimensions


where a linear separator can exist.

The Kernel Trick:

Kernels allow SVM to operate in a higher-dimensional


space without explicitly computing the transformation.
Common kernels include:

Kernel Type Function


K(x, x') = x⊤x
Linear

Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)

How SVM Works :


1. Input labeled training data (e.g., cats vs dogs).
2. Choose a kernel function (linear or non-linear).
3. Find the optimal hyperplane that maximizes the
margin between classes.
4. Classify new data based on which side of the
hyperplane they fall on.

Linear Support Vector Machine (Linear SVM):

A Linear Support Vector Machine (Linear SVM) is a


supervised machine learning algorithm used primarily
for binary classification tasks. It tries to find the best
separating hyperplane (a straight line in 2D, a plane in
3D, etc.) that maximizes the margin between two
classes.
Key Concepts
1. Hyperplane:
o A decision boundary that separates classes.

o In 2D: a line; in 3D: a plane; in higher

dimensions: a hyperplane.
2. Margin:
o The distance between the hyperplane and the

closest data points from each class.


o A larger margin generally indicates a better

generalization to unseen data.


3. Support Vectors:
o The data points that are closest to the

hyperplane.
o They "support" or define the position and

orientation of the hyperplane.


4. Linear Kernel:
o Linear SVM assumes that data is linearly
separable, meaning a straight line (or
hyperplane) can separate the classes.

Objective Function
The optimization problem is:
Minimize:
1/2∥w∥2
Subject to:
yi(w⋅xi+b)≥1for all i
Where:
 w = weight vector (normal to the

hyperplane)
 b = bias term

 xi = feature vector for sample i

 yi = label (+1 or -1)

Common Uses
 Text classification (e.g., spam detection)

 Image classification

 Bioinformatics (e.g., cancer detection)

Non-Linier Support Vector Machine:


A non-linear Support Vector Machine
(SVM) is an extension of the basic (linear)
SVM that can handle data that is not linearly
separable by transforming it into a higher-
dimensional space where a linear separation is
possible.
An SVM is a supervised machine learning
algorithm used for classification and
regression. It works by finding the optimal
hyperplane that separates data points of
different classes with the maximum margin.

Why Non-Linear?
In many real-world problems, the data is not
linearly separable. A straight line (in 2D) or
hyperplane (in higher dimensions) cannot
divide the classes perfectly.
✅ Example:
Imagine trying to classify this:
Class A (circles): ⭕
Class B (crosses): ❌
markdown

⭕ ❌

❌ ⭕

⭕ ❌
No straight line can separate them well — so
we need to go non-linear.
How It Works: The Kernel Trick
Instead of explicitly computing the
coordinates in higher dimensions, SVM uses a
kernel function to compute the dot product
in that space.
This allows the algorithm to fit the maximum-
margin hyperplane in a transformed feature
space.
Kernel Type Function
K(x, x') = x⊤x
Linear

Polynomial K(x,x′)= (x x' + c)^d
RBF
K(x,x′)=exp(−γ∥x−x′∥2)
(Gaussian)
K(x,x′)=tanh(αx⊤x′
Sigmoid
+c)
Real-World Applications
 Image classification

 Handwriting recognition (e.g., MNIST)

 Bioinformatics (e.g., gene classification)

 Text classification (e.g., spam detection)

Neural Networks and Deep Learning:


Introduction to Neural Networks:
Neural networks are a class of algorithms
inspired by the structure and functioning of the
human brain.
They form the foundation of many modern
artificial intelligence (AI) systems,
particularly in areas such as image
recognition, natural language processing,
and game playing.

What Is a Neural Network?


A neural network is a computational model
made up of layers of interconnected nodes (or
"neurons") that can learn to recognize patterns
from data
Biological Analogy:
 Neurons: Like brain cells, each node

processes input and passes on its output.


 Synapses: Connections between nodes,

each with a weight representing its


strength.
 Learning: Adjusting the weights to

improve predictions (akin to learning from


experience).
Components of a Neural Network
1. Input Layer
o Receives the input features (e.g., pixel

values of an image, text data).


2. Hidden Layers
o Perform computations and

transformations.
o Each node applies an activation

function to its inputs (like ReLU,


sigmoid).
3. Output Layer
o Produces the final prediction (e.g.,

class label, value).


How Does It Work?
The neural network processes input data in the
following steps:
1. Forward Propagation
o Data moves from input → hidden

layers → output.
o Each neuron computes a weighted sum

of inputs and passes it through an


activation function.
2. Loss Calculation
o The output is compared with the true

value using a loss function (e.g., Mean


Squared Error, Cross-Entropy).
3. Backpropagation
o The network calculates the gradient of
the loss with respect to each weight
using calculus (chain rule).
o Uses this to update the weights
(usually via gradient descent).
4. Types of Neural Networks

Type Use Case


Feedforward Neural Basic tasks (e.g., regression,
Network classification)
Convolutional Neural
Image and video recognition
Network (CNN)
Recurrent Neural Time-series or sequential
Network (RNN) data (e.g., language)
Modern NLP models (e.g.,
Transformer Networks
ChatGPT)
Example Use Cases
 Image Classification (e.g., recognizing cats vs

dogs)
 Speech Recognition

 Machine Translation

 Medical Diagnosis

 Stock Price Prediction

Why Are Neural Networks Powerful?


 Can model nonlinear and complex relationships

 Perform end-to-end learning from raw data

 Scalable to large datasets and tasks


Building and training neural networks
using TensorFlow and Kera’s:

TensorFlow framework:
TensorFlow is an open-source machine
learning and deep learning framework
developed by Google.
It allows developers and researchers to build
and train machine learning models,
particularly neural networks, for a wide
variety of tasks such as:
 Image recognition

 Natural language processing

 Time series prediction

 Reinforcement learning

 And many more

Key Features of TensorFlow:


 Flexible architecture: Can run on CPUs,

GPUs, and TPUs (specialized hardware).


 High-level APIs: Such as Keras, which

makes it easier to build and train models.


 Scalable: Can run on everything from a
single smartphone to a large-scale
distributed system.
 Support for production: TensorFlow
Serving, TensorFlow Lite, and
TensorFlow.js make it usable on servers,
mobile devices, and the web.
Example Use Case:

python

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Simple neural network for classification


model = Sequential([
Dense(64, activation='relu', input_shape=(100,)),
Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy')

Keras Framework:
Keras is an open-source deep learning framework that
makes it easier to build and train neural networks.
It provides a user-friendly, high-level API designed to
work with TensorFlow, which is a more complex backend
framework for numerical computation.
Key Points about Keras:
 High-level API: Keras is designed for human beings,
not machines.
 It prioritizes ease of use, simplicity, and fast
prototyping.
 Runs on top of TensorFlow: As of 2020, Keras is

tightly integrated into TensorFlow and is officially


part of it (tf.keras).
 Modular and Extensible: You can build complex

neural network models using a modular approach —


layers, optimizers, loss functions, and metrics are all
easily configurable.
 Used for deep learning: Applications include image

classification, natural language processing, time series


forecasting, etc.
Example:
Here's a simple Keras model for binary
classification:
python
import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
layers.Dense(64, activation='relu', input_shape=(10,)),
layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])

# Example data
import numpy as np
x_train = np.random.rand(1000, 10)
y_train = np.random.randint(0, 2, size=(1000,))
model.fit(x_train, y_train, epochs=10, batch_size=32)

This trains a simple neural network with:


 1 hidden layer of 64 neurons (ReLU activation)

 1 output layer (sigmoid for binary output)

TensorFlow, activation functions:

In TensorFlow, activation functions are crucial


components of neural networks that introduce non-
linearity, allowing the network to learn complex
patterns.
Here's a list of commonly used activation functions
in TensorFlow (tf.nn or tf.keras.activations), along
with brief descriptions:
Common Activation Functions
Activation Description TensorFlow Function
Sets all negative
ReLU
values to 0. Most tf.nn.relu(x) or
(Rectified
common due to tf.keras.activations.re
Linear
simplicity and lu(x)
Unit)
efficiency.
Squashes input to
tf.nn.sigmoid(x) or
range (0, 1). Often
Sigmoid tf.keras.activations.si
used in binary
gmoid(x)
classification.
Squashes input to
Tanh tf.nn.tanh(x) or
(-1, 1). Zero-
(Hyperboli tf.keras.activations.ta
centered unlike
c Tangent) nh(x)
sigmoid.
Activation Description TensorFlow Function
Converts logits
into probabilities
tf.nn.softmax(x) or
that sum to 1.
Softmax tf.keras.activations.so
Used in multi-
ftmax(x)
class classification
(last layer).
Variant of ReLU
Leaky that allows a small tf.nn.leaky_relu(x,
ReLU gradient when alpha=0.2)
inactive.
ELU Similar to ReLU,
(Exponenti but smoother and
tf.nn.elu(x)
al Linear has negative
Unit) outputs.
Self-normalizing
SELU variant of ELU.
(Scaled Used in self- tf.nn.selu(x)
ELU) normalizing
networks.
GELU
Smooth
(Gaussian
approximation of
Error tf.nn.gelu(x)
ReLU; used in
Linear
Transformers.
Unit)
x * sigmoid(x);
tf.nn.swish(x) or
often better than
Swish tf.keras.activations.s
ReLU in deeper
wish(x)
models.
🔹 How to Use Them in Models
With tf.keras.layers.Dense:
python

import tensorflow as tf

model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Or with the functional API:
python
x = tf.keras.Input(shape=(10,))
y = tf.keras.layers.Dense(32, activation=tf.nn.elu)(x)

Example:
TesorFlow.js
A JavaScript Library for
Training and Deploying
Machine Learning Models
In the Browser

Tensorflow Models
Models and Layers are important building blocks
in Machine Learning.
For different Machine Learning tasks you must
combine different types of Layers into a Model that
can be trained with data to predict future values.
TensorFlow.js is supporting different types
of Models and different types of Layers.
A TensorFlow Model is a Neural Network with one
or more Layers.

A Tensorflow Project
A Tensorflow project has this typical workflow:
 Collecting Data

 Creating a Model

 Adding Layers to the Model

 Compiling the Model

 Training the Model

 Using the Model

Example
Suppose you knew a function that defined a strait
line:

Y = 1.2X + 5

Then you could calculate any y value with the


JavaScript formula:

y = 1.2 * x + 5;

To demonstrate Tensorflow.js, we could train a


Tensorflow.js model to predict Y values based on X
inputs.

Note
The TensorFlow model does not know the function.

const xs = tf.tensor([0, 1, 2, 3, 4]);


const ys = xs.mul(1.2).add(5);

// Define a Linear Regression Model


const model = tf.sequential();
model.add(tf.layers.dense({units:1,
inputShape:[1]}));

// Specify Loss and Optimizer


model.compile({loss:'meanSquaredError',
optimizer:'sgd'});

// Train the Model


model.fit(xs, ys, {epochs:500}).then(() =>
{myFunction()});

// Use the Model


function myFunction() {
const xMax = 10;
const xArr = [];
const yArr = [];
for (let x = 0; x <= xMax; x++) {
let result =
model.predict(tf.tensor([Number(x)]));
result.data().then(y => {
xArr.push(x);
yArr.push(Number(y));
if (x == xMax) {plot(xArr, yArr)};
});
}
}
TensorFlow.js

What is Keras?
Keras is a deep learning API that simplifies the
process of building deep neural networks.
Initially it was developed as an independent library,
Keras is now tightly integrated into TensorFlow as
its official high-level API.
It supports multiple backend engines like
TensorFlow, Theano and Microsoft Cognitive Toolkit
(CNTK).
Keras makes it easier to train and evaluate deep
learning models without requiring extensive
knowledge of low-level operations.

How to Install Keras?


1. Since Keras is now part of TensorFlow, it can be
installed easily using pip:
pip install tensorflow
This command installs TensorFlow 2.x, which
includes Keras.
2. To check the installation, open Python and run:
import tensorflow as tf
print('TensorFlow Version: ', tf.__version__)
print('Keras Version ', tf.keras.__version__)
Output:
TensorFlow Version: 2.18.0
Keras Version 3.8.0

How to Build a Model in Keras?


Keras provides two main ways to build models:
1. Sequential API
2. Functional API

The Sequential API are easy to work with models


with a single input and output and a linear stack of
layers.
Whereas, the Functional API can be used for models
that require multiple inputs and outputs or layers
have multiple inputs or outputs.
Building Model using Sequential API
Here’s how you can define a Sequential model:
 We create a Sequential model.

 Add a fully connected (Dense) layer with 64

units and ReLU activation.


 Add another Dense layer with 10 units (for

classification) and a Softmax activation.

from keras.models import Sequential


from keras.layers import Dense, Activation

model = Sequential()
model.add(Dense(units=64, input_dim=100))
model.add(Activation('relu'))
model.add(Dense(units=10))
model.add(Activation('softmax'))

Building Model using Functional API


Functional API allows more flexibility in creating
complex architectures. You can create models with
shared layers, multiple inputs/outputs and skip
connections.
For example:
 We define two input layers (input1 and input2).

 Create separate hidden layers for each input.

 Merge the hidden layers using the concatenate

function.
 Finally, add an output layer with SoftMax

activation.
from keras.layers import Input, Dense, concatenate
from keras.models import Model

input1 = Input(shape=(100,))
input2 = Input(shape=(50,))
hidden1 = Dense(64, activation='relu')(input1)
hidden2 = Dense(32, activation='relu')(input2)
merged = concatenate([hidden1, hidden2])
output = Dense(10, activation='softmax')(merged)

model = Model(inputs=[input1, input2],


outputs=output)

Example of keras API:


Here's a simple example of using the Keras API (a
high-level API of TensorFlow) to build, compile, and
train a neural network for classifying digits from the
MNIST dataset (handwritten digits 0–9). This is a
classic example in machine learning.

Keras API Example: MNIST Digit Classifier


python
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) =
mnist.load_data()

# Normalize the input data (scale pixels from 0–255


to 0–1)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Flatten the images from 28x28 to 784 (for fully


connected network)
x_train = x_train.reshape((x_train.shape[0], 28 *
28))
x_test = x_test.reshape((x_test.shape[0], 28 * 28))

# One-hot encode the labels (e.g., 3 becomes


[0,0,0,1,0,0,0,0,0,0])
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build the neural network model using Keras


Sequential API
model = models.Sequential([
layers.Dense(512, activation='relu',
input_shape=(784,)),

# Hidden layer with 512 neurons


layers.Dense(10, activation='softmax')
# Output layer with 10 neurons (for 10 classes)
])
# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model


model.fit(x_train, y_train, epochs=5,
batch_size=128, validation_split=0.1)

# Evaluate the model on test data


test_loss, test_accuracy = model.evaluate(x_test,
y_test)
print(f'Test accuracy: {test_accuracy:.4f}')

# 7. Make predictions
predictions = model.predict(x_test)

# 8. Show some predictions


import numpy as np
for i in range(5):
plt.imshow(x_test[i], cmap='gray')
plt.title(f"Predicted: {np.argmax(predictions[i])},
Actual: {y_test[i]}")
plt.show()

🧠 Key Concepts:
 Sequential API: Used to build models layer-by-

layer.
 Dense layer: Fully connected neural network
layer.
 Activation functions: ReLU for hidden layers,
softmax for multi-class output.
 Loss function: categorical_crossentropy for
multi-class classification.
 Optimizer: adam, a widely used gradient
descent optimizer.

The MNIST dataset (Modified National Institute of Standards and


Technology database) is a benchmark dataset in machine learning
and computer vision, especially used for image classification
tasks.

📚 Overview of the MNIST Dataset

Feature Description
Type Handwritten digit images (0 to 9)
Number of classes 10 (digits 0 through 9)
Image size 28 x 28 pixels (grayscale)
Number of
60,000
training samples
Number of test
10,000
samples
Each image is a 28×28 pixel grayscale
Data format
image (784 features if flattened)
Label format A single digit (0–9)
Example Images from MNIST

Each image looks like a small, pixelated version of a handwritten


digit. Here's a visual example (you'd need to plot it in Python to
see):
python
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist

(x_train, y_train), _ = mnist.load_data()


plt.imshow(x_train[0], cmap='gray')
plt.title(f"Label: {y_train[0]}")
plt.show()

Convolutional Neural Networks (CNNs):


Convolutional Neural Networks (CNNs) are a class
of deep neural networks that are particularly
effective for processing data with a grid-like
topology, such as images.
They're most commonly used in computer vision
tasks like image classification, object detection,
segmentation, and more.

Key Concepts of CNNs


1. Convolutional Layer
 The core building block of a CNN.
 Applies filters (kernels) that slide over the input
data to detect features like edges, textures, or
shapes.
 Each filter produces a feature map (activation

map) that highlights where that feature occurs.


2. ReLU (Rectified Linear Unit)
 Applies a non-linear activation function to

introduce non-linearity.
 Commonly: ReLU(x) = max(0, x)

3. Pooling Layer
 Down samples the feature maps to reduce

dimensionality and computation.


 Common types:

o Max Pooling: takes the maximum value in

each window.
o Average Pooling: takes the average.

4. Fully Connected (Dense) Layer


 After several convolution and pooling layers, the

output is flattened and passed through one or


more dense layers for classification or regression
tasks.
5. Softmax / Sigmoid (Output Layer)
 Used in the final layer depending on the task:

o Softmax for multi-class classification.

o Sigmoid for binary classification.

Typical CNN Architecture


Input Image → [Conv → ReLU → Pool]*N →
Flatten → Dense → Output
Example: Image Classification
For a handwritten digit recognition task (like
MNIST), a CNN might:
 Detect edges in the first layer.

 Identify digit parts (like curves or lines) in

deeper layers.
 Combine these to classify the final digit.

Libraries to Use CNNs


 TensorFlow / Keras

 PyTorch

 OpenCV (for pre-processing and some model

loading)
CNN Example in Python
(TensorFlow/Keras):
import tensorflow as tf
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt

# Load and preprocess the MNIST dataset


mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) =
mnist.load_data()

# Normalize pixel values to 0-1 and reshape for CNN


input
x_train = x_train.reshape(-1, 28, 28,
1).astype("float32") / 255
x_test = x_test.reshape(-1, 28, 28,
1).astype("float32") / 255

# Build the CNN model


model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax') # 10
classes for digits 0-9
])

# Compile the model


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

# Train the model


history = model.fit(x_train, y_train, epochs=5,
validation_split=0.1)

# Evaluate the model


test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
What This Does:
Conv2D: Applies filters to detect features.

MaxPooling2D: Reduces spatial dimensions.

Flatten: Converts 2D data to 1D for the dense layer.

Dense: Fully connected layers for classification.

Recurrent Neural Networks (RNN).


What is an RNN?
A Recurrent Neural Network (RNN) is a
type of neural network designed to work with
sequential data, such as:
 Text

 Time series (like stock prices)

 Audio

 Video frames

RNNs remember past information using


loops in the network, which makes them good
at modeling sequences and time dependencies.

How Do RNNs Work?


Unlike traditional neural networks (like
feedforward networks), RNNs have memory
of previous inputs.
For a sequence of inputs:
x₁, x₂, x₃, ..., xₜ

An RNN processes each input step by step:

At time t:
Input: xₜ
Hidden state: hₜ = f(Wxₜ + Uhₜ₋₁ + b)
Output: yₜ = g(hₜ)
 hₜ: Hidden state (memory of the

network)
 Wxₜ: Input weight

 Uhₜ₋₁: Previous hidden state

 f: Activation function (e.g., tanh or

ReLU)
 g: Output function

This loop allows RNNs to carry information


from earlier in the sequence to later steps.

Applications of RNNs
 Language modeling and generation
(e.g., GPT-style models started with RNN
ideas)
 Machine translation

 Speech recognition

 Stock price prediction

 Music generation

Example of a Recurrent Neural Network


(RNN):
Example of a Recurrent Neural Network
(RNN) using Python and TensorFlow/Keras.
This example shows how to use an RNN for
sequence prediction, specifically predicting
the next number in a sequence.

Goal:
Given a sequence of numbers, the RNN learns
to predict the next number.
For example, given [1, 2, 3], predict 4.
Step-by-Step RNN Example in Python
(with TensorFlow/Keras)
python
import numpy as np
from tensorflow.keras.models import
Sequential
from tensorflow.keras.layers import
SimpleRNN, Dense

# 1. Prepare the data


# Input sequences (X): [1, 2, 3], [2, 3, 4], ...,
[7, 8, 9]
# Target values (y): 4, 5, ..., 10

X = []
y = []
for i in range(1, 8):
X.append([i, i+1, i+2])
y.append(i + 3)

# Convert to NumPy arrays


X = np.array(X)
y = np.array(y)

# Reshape X to [samples, time steps, features]


X = X.reshape((X.shape[0], X.shape[1], 1))

# 2. Define the RNN model


model = Sequential([
SimpleRNN(10, activation='relu',
input_shape=(3, 1)),
Dense(1)
])

# 3. Compile and train


model.compile(optimizer='adam', loss='mse')
model.fit(X, y, epochs=200, verbose=0)

# 4. Predict
test_input = np.array([[8, 9, 10]]).reshape((1,
3, 1))
predicted = model.predict(test_input,
verbose=0)

print(f"Predicted next number after [8, 9, 10]:


{predicted[0][0]:.2f}")
Explanation:
 Input shape: (samples, time_steps,

features) — here, each sequence has 3


time steps, with 1 feature per step.
 SimpleRNN: A basic RNN layer that

processes sequences.
 Dense: Fully connected output layer

producing a single number.

You might also like