Hardware Architectures for Artificial Intelligence
Title
(EE4690)
Lecture-2: Machine learning & deep
learning basics
Computer Engineering Lab
Faculty of Electrical Engineering, Mathematics & Computer Science
25 April 2025
Recap
• Course overview
• Introduction to Artificial Intelligence
• AI definition, history, types of AI….
• AI vs Machine Learning vs Deep-Learning
• Applications
• AI Hardware overview
• General hardware: CPU, GPU, FPGA & TPU
• Some existing AI chips
• Challenges
• AI hardware demands
2
Outline
• Basics of Machine Learning
• Definition
• Algorithms
• Machine Learning in a chip development flow
• Challenges
• Basic of Deep Learning
• Single perceptron
• Logic gates implementation
• Multi-layer perceptron
• Comparison of Machine Learning & Deep learning
3
Learning Objective
• At the end of this lecture you should be able to:
• Express machine learning basics and different types of machine learning
algorithms
• Explain single and multi-layered perceptron and concept of neural network
4
What is machine learning?
• Machine Learning (ML) is a field of study that gives computers the ability to
learn without being explicit programmed (by Arthur Samuel in 1959)
• Basically learning through doing
• ML is used when:
• Human expertise does not exist
• Navigating on Mars
• Human can not explain their expertise
• Speech recognition
Source: d2h0cx97tjks2p.cloudfront.net
5
What is machine learning?
• How is it different from the traditional programing system ?
Test input
Input Program
Computer
Computer (Machine Learning)
(Traditional programming)
Trained
Output Input Program Output
6
Working of machine learning
• Training-phase: ML algorithms learn from the past instances of data
• Through statistical analysis and pattern matching
Data for training ML algorithm Trained ML model
Practice/Training
• Prediction-phase: Provides predicted results based-on the learned data
New input data Trained ML model Desired output
• Data is the core backbone of ML algorithms Real soccer match
7
Key Components of Machine Learning
• Data: For the training purpose
• More data are desirable in order to develop powerful models
• Important to have right data
• Model: To transform the data
• Ingesting data of one type & perform predictions for different types
• Objective function: Quantifies how well the model is doing
• Function to be maximized or minimized in specific optimization problem
• Algorithm: To optimize the objective function
• During training process, perturb certain parameters, and observe the outcome
• Update in the direction where it minimizes the loss function
8
Quiz
• Which of the following correctly represents machine learning application?
1. Self-driving cars
2. Virtual personal assistant
3. Traffic prediction
4. All of them
5. None of them
Answer: 4 (All of them)
9
Machine learning algorithms
10
Supervised learning
• Machine uses data which is already tagged with the correct answer
• Means dataset on which we train our model is labeled
• After that, the machine is provided with a new set of data
• Applied to varied computer vision techniques and imagery analysis
Example
Training data → basket containing fruits
Prediction → outcome from labeled data
11
Disadvantages of Supervised ML algorithm
• Limited performance as it can’t handle complex problems in ML
• It cannot discover data on its own
• All new data should be from any of the given classes only
• It requires high computational efficiency to train the model
12
Unsupervised Learning
• Learns patterns from untagged (or unlabeled) data
• Machine itself finds the hidden structure & interprets it
• It is getting used for clustering, dimensionality reduction, feature
learning, density estimation, etc.
Example
For example:
→ A stick with a cap is pen
→ A stick with no cap a pencil.
13
Disadvantages of unsupervised ML algorithm:
• Result might be less accurate as
• We do not have any input data to train from
• The model is learning from raw data without any prior knowledge.
• It is also a time-consuming process
• During learning phase algorithm analyses and calculates many possible cases
• The more the features, the more the complexity increases
14
Supervised learning vs Unsupervised learning
Supervised Unsupervised
Input data is labelled Input data is unlabelled
Uses training dataset Uses input dataset only
Used for predictions Used for analysis
Data is classified based on training dataset Uses properties of the given data to classify it
Divided into: regression & classification Divided into clustering & association
Known number of classes Unknown number of classes
Y Y
X X
Use off-line analysis of data Use real-time analysis of data
15
Semi-supervised learning
• It is a combination of supervised and unsupervised learning
• Uses a small amount of labeled data & a large amount of
unlabeled data
• Provides benefits of both unsupervised & supervised Labeled-data
learning
• Addresses challenges of finding a large amount of labeled data
• Example: A text document classifier
• Semi-supervised learning is suitable Unlabeled-data
• Because it is difficult to find a large amount of labeled text
documents
16
Reinforcement learning
• Intelligent agents need to take actions in an environment
• Purpose is to maximize the notion of cumulative reward
• No training dataset, learn from experiences
• Learner receives rewards and punishments for their actions
• Machine automatically maximize its performance
• It is used in various autonomous systems like
• Intelligent self-driving cars, programming robots for autonomous actions….
17
Quiz
• Which of the following machine learning algorithms do not require data
labelling/tagging at all?
1. Supervised learning
2. Unsupervised learning
3. Semi-supervised learning
4. Options 1 & 3
5. Options 2 & 3
6. Reinforcement learning
Answer: 2 & 6
18
Summary: Types of Machine learning algorithms
Machine Learning
Supervised Unsupervised Reinforcement
Machine learns No training data Machine learns
from training data is required on its own
Task driven Data driven Algorithm learning from
eg: Regression & Classification eg: clustering environment
19
Linear Regression
• It is a supervised machine learning algorithm that finds out the best linear
relationship describing the input data
• To estimate real values based-on continuous variable(s)
• Represented by a linear equation:
• Y= a *X + b
where, Y → dependent variable;
a → slope
X → independent variable
b → intercept
• “a” & “b” coefficients are derived based
on minimizing the sum of squared
difference of distance between data
points and regression line
20
Logistic Regression
• It predicts the probability of occurrence of an event by fitting data to a logit
function
• Estimate discrete values (Binary values like true/false) based on given set
of independent variables
• Ex: To predict a person is suffering
from Covid-19 or not
• Symptoms include shortness of breath,
sore throat, cold, headache and chest
pain are all independent variables
• The dependent variable will be 0 or 1.
21
Decision Tree Learning
• It is a type of supervised learning algorithm
• Can be used both in classification and regression problem
• It creates a training model that can predict the class or value of the target
variable by learning simple decision rules inferred from training data
• It starts from the root of the tree
• Compare root & record’s attribute
• Follow the branch
• Jump to the next node
22
Support Vector Machines (SVM)
• SVM is a supervised machine learning algorithm
• Can be used for both classification or regression challenges
• Goal of SVM algorithm is to find a hyperplane in an N-dimensional space (N
→ number of features) that distinctly classifies the data points
• To find a plane that has the maximum margin
23
k- Nearest Neighbors (kNN)
• K-NN algorithm put the new case into most similar category
• K-NN is a non-parametric algorithm
• It does not make any assumption on underlying data
• Also called a lazy learner algorithm as it does not learn immediately
• Instead it stores the dataset & performs action at the time of classification
24
K-Means
• K-means clustering is simplest unsupervised machine learning algorithm
• The k-means algorithm mainly performs two tasks:
• Determines best value for K centroids (center pts) by an iterative process
• Assigns each data point to its closest k-center
• Points which are near to the particular k-center & create a cluster
25
Random Forest
• Random Forest is a classifier that contains a number of decision trees
• Takes the average to improve the predictive accuracy of that dataset
Less training time
& High accuracy
even with less data
26
Machine Learning in Chip Development
▪ Design verification ▪ Physical design
▪ Margin prediction ▪ Failure modeling
▪ Statistical analysis EDA → Electronic
▪ ………
▪ ……… design automation
Design EDA
Chip development
Fabrication Test
▪ Yield enhancement ▪ Chip testing
▪ Optical proximity correction ▪ Failure modeling
▪ Inverse lithographic techniques ▪ Test generation
▪ Process checking ▪ ………
▪ ………
27
Benefits of Machine learning
• Easily identify trends & patterns:
• Example: Understand browsing history and purchase behaviour (e.g., Amazon)
• No human intervention needed
• Providing the machine the ability to learn
• Scope of improvement
• Ability to improve with increasing data (e.g., weather prediction)
• Hardware CPUs, GPUs, etc., with ML capabilities for faster processing
• Various software libraries & UIs for developing more efficient algorithm
• Efficient of handling data
• Good at handling data that are multi-dimensional and multi-variety
• Wide applications
• Role everywhere from business, medical, banking to science & technology
28
Machine learning challenges
• More data:
• Huge amount of data to train a model
• Example: For image classification, thousands of images are require to train a model
• More computation:
• Require more resources to compute simple tasks
• More power consumption & storage units
• Possibility of high error
• New & better algorithms are required to improve the efficiency
• Time
• Take a lot of time to resolve error as data is huge
29
Need of Deep Learning
• Deep Learning is a subset of Machine Learning
• Deep learning describes algorithms that analyze data with a logic structure
similar to how a human would draw conclusions
30
Course: Homework
o Assignment:
• One paper from “Machine Learning in VLSI CAD” Book
• List of papers will be given in Brightspace
• Questions:
• Please name the paper that you have chosen.
• Give summary of the paper (not more than half page).
• What are the key contributions of the paper?
• Which machine learning algorithm is employed and how is it used?
• Describe the result generation setup and explain the key results.
• Specify the drawbacks for their proposed method.
• Deadline: 06th May 2024 (2359 Hrs).
31
Deep Learning
32
McCulloch-Pitts Neuron Model
• First computational model of a neuron Bias
was proposed in 1943 Summing
junction Output
• Warren MuCulloch (neuroscientist) and
• Walter Pitts (logician)
Input Activation
signals function
• Mimicking the functionality of a Synaptic
biological neuron weights
• A set of synapses (connections)
• Processing unit sums the inputs and
applies to an activation function
33
The Perceptron
Input layer Output layer
x1 O1
x2 O2
xd OM
• Many McCulloch-Pitts neurons can be connected together
• An arrangement of one input layer of McCulloch-Pitts neurons feeding
forward to one output layer is known as a perceptron
34
Example: Implementing logic gates
NOT 1
?
Input Output x1 V1 Threshold
0 1 W1 = ? ∑ activation
function
y
x2
1 0 W2 = ?
AND
Network can be trained for appropriate x1 x2 y
weights and thresholds to classify 0 0 0
correctly the different classes 0 1 0
→ by creating decision boundaries Boundary 1 0 0
between classes 1 1 1
W1= 1;
W2= 1;
Bias= -1.5
Consider threshold activation function (1 if v1 >= 0;
0 otherwise)
35
Question: How we can do it for an OR gate?
OR
For what value of “Bias” the OR gate Input-1 Input-2 Output
can be implemented? 0 0 0
0 1 1 Boundary
=> Discuss with peers 1 0 1
1 1 1
W1= 1;
W2= 1;
Bias= ? 1
?
Consider threshold activation function (1 if v1 >= 0;
0 otherwise) x1 Threshold
W1 = ? ∑ activation
function
y
Bias= -0.5 x2
W2 = ?
36
How we can do for XOR?
• Realization of a two input XOR gate (or XOR
XNOR gate) is not trivial Input-1 Input-2 Output
0 0 0
0 1 1 Boundary
• We need multiple decision boundaries 1 0 1
1 1 0 Boundary
• Boundaries can be generated by :
• Change the transfer function
• A more complex network
• Like multi-layer perceptron
37
Multilayer Perceptron
• These are Deep networks consist of fully connected multiple layers
• Multiple hidden layers can be possible depending on the function of the neural network
• Perform nonlinear transformations of the inputs entered into the network
• Hidden layer works as a biological neuron in the brain
• It takes in its probabilistic input signals, and works on them
• It converts them into an output corresponding to the biological neuron's axon
38
A Real Neuron
Neuron terminals Functionality System equivalent representation
Dendrite Receives signal from other neurons Input terminal
Nucleus Processes the information CPU
Axon terminals Transmits the output of this neuron Output terminals
Axon Transmits signal wire
Synapse Point of connection to other neurons Enabling connection
39
Activation function
40
Activation Function
Tanh function
Relu (Rectified Linear Unit) activation function
41
Importance of Activation Function
• Activation function introduces non-linear in network
• Example: Task is to build a network to distinguish red vs green circles
Decision boundaries for Decision boundaries for non-
linear activation functions linear activation functions
42
Training vs Inference
Source: Nvidia
• Similar to ML, DNN also has training and inference phases
• Inference → forward propagation
• Training → forward propagation + backward propagation
• A backward propagation phase propagates the error back
• Through network’s layer and update their weights
43
Deep learning example
Convolution Neural
Network is a type of deep
learning model for
processing images
Source: DeepMind
44
DNN Applications
45
DNN challenges
• DNN training algorithms require huge amount of data
• Ex: A single image in Cityspaces dataset (useful for self-driving cars)
needs 1.5 Hrs on average for fine pixel-level annotation
• For 5K images, it is 7500 hrs (~75K Euro)
• Requires high-performance hardware
• To ensure better efficiency
• High processing cost
• Consume a lot of power
• Lack of Flexibility & multitasking
Source: World economic forum
46
Machine Learning versus Deep Learning
Parameters Machine learning Deep Learning
Accuracy level Low high
▪ Can train on less data ▪ Large data for training
Data requirement ▪ Need structured data ▪ Doesn’t require structured
data
Training time Less (a few hours) More (a few weeks)
Relatively less computation, More computation require:
Hardware dependency sometimes CPU can also work GPUs & TPUs
Hyperparameter tuning Limited More different ways
Simple like forecasting & Complex like autonomous
Applications
predicting vehicles
47
Summary
• Basics of Machine learning
• ML algorithms
• Supervised, unsupervised & reinforcement learning
• ML challenges
• Basics of Deep Learning
• Single vs multi-layer perceptron
• Training & inference of Deep learning
48
Thank you
Any question ?
Lecture-3: DNN models
Day & time: 29th April 2025 at 0845 Hrs
49