Deep Learning: Introduction
Pr. Tarik Fissaa
DATA – INE2
A.U : 2022/2023
What is Deep Learning ?
Why Deep Learning and Why Now ?
Why Deep Learning ?
Hand engineered features are time consuming, brittle, and not scalable in practice.
Can we learn the underlying features directly from data?
Why Now?
Neural networks date back decades, so why the resurgence?
1. Big Data 2. Hardware 3. Software
. Larger Datasets . Graphics Processing . Improved Techniques
. Easier Collection Units (GPUs) . New Models
& Storage . Massively Parallelizable . Toolboxes
The Perceptron
The structural building block for deep learning
The perceptron: Forward Propagation
The perceptron: Forward Propagation
The perceptron: Forward Propagation
The perceptron: Forward Propagation
Common Activation Functions
Importance of Activation Functions
Importance of Activation Functions
The Perceptron: Example
The Perceptron: Example
The Perceptron: Example
The Perceptron: Example
Building neural networks with Perceptrons
The Perceptron: simplified
The Perceptron: simplified
Multi Output Perceptron
Because all inputs are densely connected to all outputs, these layers are called Dense layers
Multi Output Perceptron
Because all inputs are densely connected to all outputs, these layers are called Dense layers
Single Layer Neural Network
Single Layer Neural Network
Single Layer Neural Network
Deep neural Network
Deep neural Network
Applying Neural Networks
Exemple
Will I pass this class?
let’s start with a simple two feature model:
𝑥1 = Number of lectures you attend.
𝑥2 = Hours spent on the final project
Exemple problem: Will I pass this class?
𝑥2 = Hours
spent on the
final project
𝑥1 = Number of lectures you attend
Exemple problem: Will I pass this class?
𝑥2 = Hours
spent on the
final project
𝑥1 = Number of lectures you attend
Exemple problem: Will I pass this class?
Exemple problem: Will I pass this class?
Quantifying Loss
The loss of our network measures the cost incurred from incorrect predictions
Empirical Loss
The empirical loss measures the total loss over our entire dataset
Binary Cross Entropy Loss
Cross entropy loss can be used with models that ouput a probability between o and 1
Mean Squared Error Loss MSE
Mean squared error loss can be used with regression models that ouput continuos
real numbers
Training Neural Networks
Loss Optimization
We want to find the network weights that achieve the lowest loss
Loss Optimization
We want to find the network weights that achieve the lowest loss
Loss Optimization
Loss Optimization
Loss Optimization
Loss Optimization
Loss Optimization
Algorithme du gradient (Gradient descent)
Computing gradients: Backpropagation
How does a small change in one weight (ex. 𝑤2 ) affect the final loss 𝐽 𝑊 ?
Computing gradients: Backpropagation
Computing gradients: Backpropagation
Computing gradients: Backpropagation
Computing gradients: Backpropagation
Repeat this for every weight in the network using gradients from later layers
Computing gradients: Backpropagation
Neural Networks in Practice:
Optimization
Training Neural Networks is difficult
« Visualizing the loss
landscape ». Hao Li, Dec 2017
Loss functions can be difficult to optimize
Loss functions can be difficult to optimize
Setting the learning Rate
Setting the learning Rate
Setting the learning Rate
Comment gérer cela?
Idea 1:
Try lots of different learning rates and see what works « just right »
Comment gérer cela?
Idée 1:
essayez de nombreux taux d'apprentissage différents et voyez ce qui fonctionne « juste »
Idée 2:
Do something smarter!
Design and adaptive learning rate that adapts to the landscape
Adaptive Learning Rates
• Learning rates are no longer fixed
• Can be made larger or smaller depending on:
• How large gradient is
• How fast learning is happening
• Size of particular weights
• Etc...
Gradient Descent Algorithms
Neural Networks in Practice:
Mini-batches
Mini-batches while training
More accurate estimation of gradient
smoother convergence
Allows for larger learning rates
Mini-batches while training
Estimation plus précise du gradient
Convergence plus fluide
Permet des taux d'apprentissage plus élevés
Mini-batches lead to faster training
Can parallelize computation + achieve significant speed increase on GPU’s
Neural Networks in Practice:
Overfitting
Regularization
what is it?
Technique that constrains our optimization problem to discourage complex models
Regularization
C’est quoi?
Technique qui contraint notre problème d'optimisation à décourager les modèles complexes
Why?
Improve generalisation of our model on unseen data
Regularization 1: Dropout
Regularization 1: Dropout
Regularization 1: Dropout
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Regularization 2: Early Stopping
Résumé: les fondations de base