0% found this document useful (0 votes)

9 views15 pages

Artificial Intelligence

The document provides an overview of artificial intelligence, machine learning, and deep learning, detailing techniques such as neural networks, activation functions, and various training methods like gradient descent. It discusses the architecture and applications of convolutional neural networks (CNNs), including notable models like LeNet-5, VGG-Net, and GoogleNet, as well as the challenges of deep networks addressed by residual networks. Additionally, it covers sequence modeling applications and the mechanics of recurrent neural networks (RNNs), highlighting issues such as long-term dependency in training.

Uploaded by

Azzedine Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views15 pages

Artificial Intelligence

Uploaded by

Azzedine Bk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Artificial intelligence:

techniques that enable machines to mimic human behavior

Machine learning:
ability to learn without explicit instructions

Deep learning:
extract patterns from raw data using neural networks

Why deep learning:

hand coding features is time consuming and not scalable:

Why use em now:

-easier to collect and store large data

-higher performing hardware

-new techniques and frameworks, and new architectures and models

Neural network basics:

Activation functions :

Activation functions are used to introduce nonlinearity to the network, for example if we separate
green from red here:
If we don’t use an activation function the network would predict like this:

No matter the dispersion the network would always act like linear regression, activation function
allow it to recognize higher patterns:
Multi layered neural networks:
-multi output perceptron: instead of a single output we have multiple outputs with the same inputs
but different weights:

-single layer network: this network has one hidden layer in which the model learns patterns, each
node in the layer takes inputs X and weights W:

This value is then passed through an activation function to add nonlinearity and then passed to the
output layer, the output layer has its own weight and activation function:

Quantifying loss :
the networks quantifying loss is the difference between the predicted value and actual value and is
represented as:

Empirical loss:

Measures the total loss over the entire network also known as cost function or objective function or
risk
Binary cross entropy:
this loss function is specifically used for classification problems ( 0,1 output):

Mean squared error:

this is used for regression models with continuous output:

Training neural network:

training a neural network means updating the weights of a model till the difference between the
predicted value and actual value is minimal

Gradient descent:
-initialize weights randomly
-loop till convergence, which is either running out of iterations, improvements are too small.

-calculate the gradient

-update the weights

where n represents the learning rate which indicates how big of steps we take

-return weights

This approach is computationally expensive as we do this for every training example

Stochastic gradient descent:

Instead of using the entire data set like regular gradient descent,, stochastic picks a random example
at each step, updates the weights based on that and continues till convergence but the updates are
noisier (due to the random nature of the example)

Mini batches gradient descent:

This one is a compromise between gradient and stochastic where it picks a small part of the set to
update the weights on then computes the gradient:

Batch normalization:

Batch normalization is a technique used to help models train faster and more stable, the input of
networks keeps changing during training which is knows as internal covariate shift

Normalization:

Also known as min max scaling, it compresses all values to be between 1 and 0 but preserves the
shape of the data

Standardization:
transforms the data so the mean is 0 and variance is 1 which results in a normal distribution

Batch normalization process:

-select a mini batch

-compute mean : μ = (1/m)∑(x_i)

-compute variance: σ² = (1/m)∑(x_i - μ)²

-normalize : : x̂_i = (x_i - μ)/√(σ² + ε) where ε is to prevent division by zero

-scale and shift: y_i = γx̂_i + β where γ is a learnable scale and β is a learnable shift
Benefits:
-centers input around 0

-achieve accuracy faster

-better performance

-no standardization layer

-no regularization

-epochs will take longer but convergence is faster

Computer vision:
discover what is going on in the world and predict upcoming events based on it. This field has
applications in medicine, automation, accessibility and robotics

What do computers see:

to a computer an image is a matrix of numbers with width And height and depth, this depth
represents how many channels it has, an RGB image has three channels so a 10800*1080 image is
1080*1080*3

Computer vision tasks:

regression: predicting a continuous value

Classification: predicting what is the class of an object

Manual feature extraction:

to identify features manually, we first need domain knowledge, then a definition of the feature we
want to extract, then we detect features and classify them

Fully connected neural network:

a fully connected network has every neuron in the current layer connected to every neuro in the
previous layer

in this architecture, the input is a vector of pixel values. This leads to the loss of the spatial structure,
and being fully connected means there are a lot of weights
to solve this issue, we connect only patches of input to a neuron in the hidden layer, basically only
giving it a window of the original input

we slide this window across the whole window to define connections

this technique is called convolution

CNN:
this is a neural network that learns feature through applying filters or kernels and its usually compose
of:
- convolutional layers for feature extraction

-An activation to add non linearity , usually Relu

- pooling, either max or avg to downsize the resulting features

for a neuron in the hidden layer:

-it takes a patch

-calculates the weighted sum and adds the bias

The output of a convolutional network is a volume where height and width are spatial dimensions
and depth is the number of filters or feature maps extracted

The Relu activation functions replaces all negative values after a convolution with a 0

Pooling:
pooling is down Samling while retaining spatial invariance, it is done by sliding a filter across the
feature maps and picking the max value in max pooling or the avg value in avg pooling
CNN for classification:
after convolutions and activation and pooling , the resulting features are passed to a fully connected
layer for classification which expresses the possibility of an image belonging to a class

Lennet 5:
-introduced In 1989 for character recognition but was limited due to hardware

Key features:
-convulutional layers extract features

-tanh activation function

-fully connected layer for classification

-sparse connections reduce complexity

Architecture:
Advantages:
-simple and efficient for small sets

-low complexity

-demonstrates effectiveness of CNNs

Disadvantages:
-limited to small inputs 32*32

-not effective for complex set

-requires pre processing

VGG-NET:
developed by visual geometry group and is either 16 or 19 layers, used for image detection,
introduced in 2014

VGG 19 adds 3 extra convolutional layers which means slightly better performance

Advantages:
-easy to implement

-small filters extract better feature with reduced computational load

-widely available pre trained models

-versatile
Disadvantages:
-very slow to train

-heavy requirements

-inefficient

Applications:
-pre trained models in transfer learning

-object detection and feature extraction

GoogleNet:
introduced in 2014 and known as inception V1 and focuses on accuracy and efficiency

Why it came:

-shallow architectures like LeNet-5 and AlexNet

-vanishing gradients

-high computational costs in deeper networks

Architectural highlights:

Uses inception models for parallel processing at different scales

Use global average pooling

Uses auxiliary classifiers during training to solve vanishing gradients

What are inception modules:

introduced with google net and captures feature more efficiently than other models by processing
images in parallel
Evolution of inception architecture:
-Inception-v2: batch normalization and more efficient convolutions

-Inception-v3: atrous convolutions

-Inception-v4 and ResNet hybrids: residual connections for better training

What is googlenet used for:

-image classification

-object detection

-image segmentation

-video analysis

Advantages:
-efficient feature extraction

-reduced parameters

-improved accuracy

Disadvantages:
-complex architecture

-requires high experience to customize it

-computationally expensive during inference

Legacy of google net:

-set a foundation for modular architectures

-first for hybrid models

-inspired light weight models

Residual networks:
early architectures used more layers to reduce errors rates, but this led to more problems such as
vanishing gradients and higher computational costs
Why ResNets:
the issue Was that deeper networks tend to perform worse than shallow networks due to vanishing
or exploding gradients

This was solved by introducing residual blocks with skip connections

Residual blocks and skip connections:

provide two paths for information to flow, effectively allowing the network to bypass layers it deems
unnecessary, this allows the network to learn on residuals and solve vanishing and exploding
gradients

How resnet works:

inspired by VGG 19 architecture it is a 34 layer network with shortcut or skip connections. Used CIFAR
dataset to test depth up to 1000 layers

Why do they work:

-overcome problems with very deep networks

-poor performing layers are bypassed by the network

-superior accuracy

Sequence modeling applications:

-machine translation

-image captioning

-sentiment classification

Neurons with recurrence:

in a neuron with recurrence, there is an additional slot for the state of the neuron called hidden
state, each output of the neuron depends on its current input and its previous context or state.
Recurrent neural networks :
in an RNN, information is maintained about past inputs through a state. Every step through
processing a sequence, a recurrence relation is applied:

RNN state update and output:

to update the hidden state we take the input vector and:

Apply the tanh activation function to hidden state to hidden state weight plus the input to hidden
weight and you get the new state

To get the output:

We multiply the hidden state to output weights by the new state

Backpropagation through time:

When we unroll an RNN, we notice that at each instance T its calculating a hidden state and output,
then the model calculates the loss by comparing predicted with actual, the total loss is :

The gradient of this loss is then back propagated through the unrolled RNN

The problem of long term dependency:

when calculating BPTT, gradients are computed by repeatedly multiplying derivatives, if these weights
are small, repeated multiplication can cause then to exponentially shrink as we move backwards. This
means long term errors contribute less and less. This makes the network biased towards short term
dependencies

Advanced Deep Learning Ghosal
No ratings yet
Advanced Deep Learning Ghosal
9 pages
Image Captioning for AI Developers
67% (3)
Image Captioning for AI Developers
16 pages
00 Intro
No ratings yet
00 Intro
18 pages
23 DeepLearning PDF
No ratings yet
23 DeepLearning PDF
74 pages
1.neural Networks and Convolutional Processing
No ratings yet
1.neural Networks and Convolutional Processing
94 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
31 pages
Unit III
No ratings yet
Unit III
58 pages
Intro to CNNs for Tech Enthusiasts
No ratings yet
Intro to CNNs for Tech Enthusiasts
31 pages
CNNs: A Guide for Tech Enthusiasts
No ratings yet
CNNs: A Guide for Tech Enthusiasts
80 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
Introtodeeplearning MIT 6.S191
No ratings yet
Introtodeeplearning MIT 6.S191
36 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Deep Learning Cheatsheet Guide
No ratings yet
Deep Learning Cheatsheet Guide
14 pages
Chapter21 4e
No ratings yet
Chapter21 4e
35 pages
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
No ratings yet
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
24 pages
Unit 5a - Machine Vision
No ratings yet
Unit 5a - Machine Vision
55 pages
Deep Learning Curriculum
No ratings yet
Deep Learning Curriculum
23 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
18 pages
RNNs: Understanding and Applications
No ratings yet
RNNs: Understanding and Applications
30 pages
DL Unit 3 Important Questions and Answers PDF .. - 1
No ratings yet
DL Unit 3 Important Questions and Answers PDF .. - 1
8 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
CS361 Soft Computing CSE Syllabus-Semesters - 5 PDF
No ratings yet
CS361 Soft Computing CSE Syllabus-Semesters - 5 PDF
3 pages
TensorFlow 2.0 Tutorials & Examples
No ratings yet
TensorFlow 2.0 Tutorials & Examples
6 pages
ML Lec 13 CNN
No ratings yet
ML Lec 13 CNN
44 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
04introduction To Neural Networks
No ratings yet
04introduction To Neural Networks
62 pages
Deep Learning Basics Lecture 6 Convolutional NN
No ratings yet
Deep Learning Basics Lecture 6 Convolutional NN
36 pages
Unit 5
No ratings yet
Unit 5
25 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
NNFL CBCGS Syllabus
No ratings yet
NNFL CBCGS Syllabus
8 pages
Google Net
No ratings yet
Google Net
40 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Antim Prahar AI and ML For Business 2025
No ratings yet
Antim Prahar AI and ML For Business 2025
45 pages
Terms To Review
No ratings yet
Terms To Review
9 pages
Activation Function - A Mathematica
No ratings yet
Activation Function - A Mathematica
11 pages
Unsupervised Learning: Part III Counter Propagation Network
100% (1)
Unsupervised Learning: Part III Counter Propagation Network
17 pages
Lec 07 8
No ratings yet
Lec 07 8
40 pages
Z t1 TL/ 0 (:, Lfiucor/ Wott/ttr ".
No ratings yet
Z t1 TL/ 0 (:, Lfiucor/ Wott/ttr ".
5 pages
Deep Learning Lab
No ratings yet
Deep Learning Lab
11 pages
GANs for AI Researchers
No ratings yet
GANs for AI Researchers
3 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Video Classification Project
No ratings yet
Video Classification Project
52 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
GNN Implementation in Keras & PyTorch
No ratings yet
GNN Implementation in Keras & PyTorch
10 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Deep Learning (22CS63) : Module-3
No ratings yet
Deep Learning (22CS63) : Module-3
58 pages
Unit - 3 Introduction To Neural Network
No ratings yet
Unit - 3 Introduction To Neural Network
8 pages
Deep Learning for MNTC 313 Students
No ratings yet
Deep Learning for MNTC 313 Students
1 page
P7 GenderImageCategorization DL D1
No ratings yet
P7 GenderImageCategorization DL D1
2 pages
Linear Optimization - Max
No ratings yet
Linear Optimization - Max
186 pages
2 Marks Gen AI
No ratings yet
2 Marks Gen AI
14 pages
DL Ia2
No ratings yet
DL Ia2
13 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
Neural Network Thesis Writing Help
100% (3)
Neural Network Thesis Writing Help
4 pages
Unit 3
No ratings yet
Unit 3
59 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
AlexNet and Other Pretrained Models - Presentation
No ratings yet
AlexNet and Other Pretrained Models - Presentation
182 pages
Interview Questions in Neural Network
No ratings yet
Interview Questions in Neural Network
9 pages
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
No ratings yet
2630 20230529 Mahdi Momen Aldawood HH 15261 946399124
11 pages
ANN Lab Syllabus
No ratings yet
ANN Lab Syllabus
2 pages
Face Mask Detection System Using Ai
No ratings yet
Face Mask Detection System Using Ai
5 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
NNML Full
No ratings yet
NNML Full
19 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
11 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
Mcculloch Pittsneuron
No ratings yet
Mcculloch Pittsneuron
16 pages
DAI School TG 7
No ratings yet
DAI School TG 7
5 pages
Iva Unit-5 Edited
No ratings yet
Iva Unit-5 Edited
42 pages
Seminar
No ratings yet
Seminar
10 pages
Unit 3
No ratings yet
Unit 3
14 pages
Computer Vision & CNNs - Study Notes
No ratings yet
Computer Vision & CNNs - Study Notes
12 pages
Unit 4 (CNN and SOM)
No ratings yet
Unit 4 (CNN and SOM)
15 pages
CSCI417 Machine Intelligence - Lec11 RNN - V1
No ratings yet
CSCI417 Machine Intelligence - Lec11 RNN - V1
61 pages
DL Full Notes
No ratings yet
DL Full Notes
17 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Notes From Training
No ratings yet
Notes From Training
12 pages
DL
No ratings yet
DL
4 pages
Unit 4
No ratings yet
Unit 4
51 pages

Artificial Intelligence

Uploaded by

Artificial Intelligence

Uploaded by

Artificial intelligence:

techniques that enable machines to mimic human behavior

Why deep learning:

Why use em now:

-easier to collect and store large data

-higher performing hardware

-new techniques and frameworks, and new architectures and models

Mean squared error:

Training neural network:

-calculate the gradient

-update the weights

This approach is computationally expensive as we do this for every training example

Stochastic gradient descent:

Mini batches gradient descent:

Batch normalization process:

-compute mean : μ = (1/m)∑(x_i)

-compute variance: σ² = (1/m)∑(x_i - μ)²

-normalize : : x̂_i = (x_i - μ)/√(σ² + ε) where ε is to prevent division by zero

-achieve accuracy faster

-no standardization layer

-epochs will take longer but convergence is faster

What do computers see:

Computer vision tasks:

Classification: predicting what is the class of an object

Manual feature extraction:

Fully connected neural network:

we slide this window across the whole window to define connections

this technique is called convolution

-An activation to add non linearity , usually Relu

- pooling, either max or avg to downsize the resulting features

for a neuron in the hidden layer:

-calculates the weighted sum and adds the bias

-tanh activation function

-fully connected layer for classification

-sparse connections reduce complexity

-demonstrates effectiveness of CNNs

-not effective for complex set

-requires pre processing

-small filters extract better feature with reduced computational load

-widely available pre trained models

-object detection and feature extraction

-shallow architectures like LeNet-5 and AlexNet

-high computational costs in deeper networks

Uses inception models for parallel processing at different scales

Use global average pooling

Uses auxiliary classifiers during training to solve vanishing gradients

What are inception modules:

-Inception-v3: atrous convolutions

-Inception-v4 and ResNet hybrids: residual connections for better training

What is googlenet used for:

-requires high experience to customize it

-computationally expensive during inference

Legacy of google net:

-first for hybrid models

-inspired light weight models

This was solved by introducing residual blocks with skip connections

Residual blocks and skip connections:

How resnet works:

Why do they work:

-poor performing layers are bypassed by the network

Sequence modeling applications:

Neurons with recurrence:

RNN state update and output:

To get the output:

We multiply the hidden state to output weights by the new state

Backpropagation through time:

The problem of long term dependency:

You might also like