Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views45 pages

NN Unit 1

The document provides an overview of the development stages of artificial intelligence (AI), highlighting the evolution from simple logical rule-based systems to advanced machine learning and deep learning techniques. It details the historical milestones in neural networks, including the introduction of shallow and deep neural networks, and discusses the characteristics that differentiate modern deep learning algorithms from traditional methods. Additionally, it compares shallow and deep neural networks in terms of structure, performance, and suitability for various tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views45 pages

NN Unit 1

The document provides an overview of the development stages of artificial intelligence (AI), highlighting the evolution from simple logical rule-based systems to advanced machine learning and deep learning techniques. It details the historical milestones in neural networks, including the introduction of shallow and deep neural networks, and discusses the characteristics that differentiate modern deep learning algorithms from traditional methods. Additionally, it compares shallow and deep neural networks in terms of structure, performance, and suitability for various tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

1.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE


1.1 VARIOUS DEVELOPMENT STAGES OF ARTIFICIAL INTELLIGENCE:
 AI is a technology that allows machines to acquire intelligent and inferential mechanisms
like humans. This concept first appeared at the Dartmouth Conference in 1956.The
development of Al has mainly gone through three stages, where each stage represents the
exploration of the human trying to realize Al from different angles.
 In the first stage, people tried to develop intelligent systems where they summarized and
generalized some logical rules and implemented them in the form of computer programs.
But those logical rules are too simple and were difficult to be used to express complex and
abstract concepts. This stage is called the inference period.
 In the 1970s, scientists tried to implement AI through knowledge database and reasoning.
They built a large and complex expert system to replicate the intelligence level of human
experts. One of the biggest difficulties of inference period was that the process of human
recognition of pictures and understanding of languages cannot be replicated by established
rules.
 To solve these problems, a research discipline that allowed machines to automatically
learn rules from data, known as machine learning, was born which became a popular
subject in AI in the 1980s. This is the second stage.
 In machine learning, there is a direction to learn complex, abstract logic through neural
networks.
 The third revival of AI was since 2012 when the applications of deep neural network
technology have made major breakthroughs in fields like computer vision, natural
language processing (NLP), and robotics.
 Some tasks have even surpassed the level of human intelligence.. Deep neural networks
eventually got a new name – deep learning. The essential difference between neural
networks and deep learning is not large as deep learning refers to models or algorithms
based on deep neural networks.

1.2 THE ADVENT OF NEURAL NETWORKS AND DEEP LEARNING:


 Neural network algorithms are a class of algorithms that learn from data based on neural
networks. They still belong to the category of machine learning. Due to the limitation of
computing power and data volume, early neural networks were shallow.
 Therefore, the network expression ability was limited. With the improvement of
computing power and the arrival of the big data and highly parallelized graphics
processing units (GPUs) and make training of large-scale neural networks possible.
 In 2006, Geoffrey Hinton first proposed the concept of deep learning.
 In 2012, Alex Net, an eight-layer deep neural network, was released and achieved huge
performance improvements in the image recognition competition. Since then, neural
network models with thousands of layers have been developed successively, showing
strong learning ability.
 Algorithms implemented using deep neural networks are generally referred to as deep
learning models. In essence, neural networks and deep learning can be considered the
same.
 The comparison of deep learning with other algorithms can be explained with the help of
following diagram:

 The rule-based systems usually write explicit logic, which is generally designed for
specific tasks and is not suitable for other tasks.
 Traditional machine learning algorithms artificially design feature detection methods with
certain generality, such as SIFT and HOG features which are suitable for a certain type of
tasks and have certain generality. But the performance highly depends on how those
features are designed.
 The emergence of neural networks has made it possible for computers to design those
features automatically through neural networks without human intervention.
 Shallow neural networks typically have limited feature extraction capability, while deep
neural networks are capable of extracting high-level, abstract features and have better
performance.

1.3 RELATIONSHIP BETWEEN AI , DEEP LEARNING AND MACHINE


LEARNING.

AI is a technology that allows machines to acquire intelligent and inferential mechanisms


like humans.’
● The people tried to develop intelligent systems where they summarized and
generalized some logical rules and implemented them in the form of computer
programs. But those logical rules are too simple and were difficult to be used to
express complex and abstract concepts.
● To solve inference period problems, a research discipline allowed machines to
automatically learn rules from data, known as machine learning.
● In machine learning, there is a direction to learn complex, abstract logic through
neural networks.
● Deep neural networks eventually got a new name – deep learning.

1.4 HISTORY OF NEURAL NETWORKS


1.4.1 SHALLOW NEURAL NETWORKS ALONG WITH ITS
DEVELOPMENT TIMELINE:
 We divide the development of neural networks into shallow neural network stages
and deep learning stages.
 In 1943, psychologist Warren McCulloch and logician Walter Pitts proposed the
earliest mathematical model of neurons based on the structure of biological
neurons, called MP neuron models after their last name initials.
 The model f (x) = h (g(x)), where g(x) = ∑xi, xi ∈ {0, 1}, takes values from g(x) to
predict output values.
 If g(x) ≥ 0, output is 1; if g(x) < 0, output is 0.
 The MP neuron models have no learning ability and can only complete fixed logic
judgments.

 In 1958, American psychologist Frank Rosenblatt proposed the first neuron model
that can automatically learn weights, called perceptron. The error between the
output value o and the true value y is used to adjust the weights of the neurons
{w1,w2,…,wn}. Frank Rosenblatt then implemented the perceptron model based
on the “Mark 1 perceptron” hardware. The input is an image sensor with 400
pixels, and the output has eight nodes.it can finally identify English letters.
 The main flaw of linear models such as perceptrons is that perceptrons cannot handle
simple linear inseparable problems such as XOR. This directly led to the tough period of
perceptron-related research on neural networks. It is generally considered that 1969–1982
was the first winter of artificial intelligence.
 Although it was in the tough period of AI, there were still many significant studies
published such as back propagation (BP) algorithm, which is the core foundation of
modern deep learning algorithms. In fact, the mathematical idea of the BP algorithm has
been derived as early as the 1960s, but it had not been applied to neural networks at that
time.
 In 1974, American scientist Paul Werbos first proposed that the BP algorithm can be
applied to neural networks in his doctoral dissertation. Unfortunately, this result has not
received enough attention. In 1986, David Rumelhart et al. published a paper using the BP
algorithm for feature learning in Nature. Since then, the BP algorithm started gaining
widespread attention.
 During the second wave of artificial intelligence renaissance that started from 1982 to
1995, convolutional neural networks, recurrent neural networks, and backpropagation
algorithms were developed. In 1986, David Rumelhart, Geoffrey Hinton, and others
applied the BP algorithm to multilayer perceptrons.
 In 1989, Yann LeCun and others applied the BP algorithm to handwritten digital image
recognition and achieved great success, which is known as LeNet. The LeNet system was
successfully commercialized in zip code recognition, bank check recognition, and many
other systems.
 In 1997, one of the most widely used recurrent neural network variants, Long ShortTerm
Memory (LSTM), was proposed by Jürgen Schmidhuber. In the same year, a bidirectional
recurrent neural network was also proposed.
 Unfortunately, the study of neural networks has entered a tough with the rise of traditional
machine learning algorithms represented by support vector machines (SVMs), which is
known as the second winter of artificial intelligence.
 Support vector machines have a rigorous theoretical foundation, require a small number of
training samples, and also have good generalization capabilities. In contrast, neural
networks lack theoretical foundation and are hard to interpret. Deep networks are difficult
to train, and the performance is normal.
1.4.2 DEEP NEURAL NETWORKS ALONG WITH ITS DEVELOPMENT
TIMELINE:

 We divide the development of neural networks into shallow neural network stages and
deep learning stages, with 2006 as the dividing point.
 In 2006, Geoffrey Hinton found that multilayer neural networks can be better trained
through layer-by-layer pre-training and achieved a better error rate than SVM on the
MNIST handwritten digital picture data set, turning on the third artificial intelligence
revival.
 In 2011, Xavier Glorot proposed a Rectified Linear Unit (ReLU) activation function,
which is one of the most widely used activation functions now
 In 2012, Alex Krizhevsky proposed an eight-layer deep neural network AlexNet, which
used the ReLU activation function and Dropout technology to prevent overfitting.
 Since the AlexNet model was developed, various models have been published
successively, including VGG series, GoogleNet series, ResNet series, and DenseNet
series.
 In 2014, Ian Goodfellow proposed generative adversarial networks (GANs), which
learned the true distribution of samples through adversarial training to generate samples
with higher approximation. Since then, a large number of GAN models have been
proposed.
 In 2016,DeepMind applied deep neural networks to the field of reinforcement learning
and proposed the DQN algorithm, which achieved a level comparable to or even higher
than that of humans in 49 games in the Atari game platform.
 In the field of Go, AlphaGo and AlphaGo Zero intelligent programs from DeepMind
have successively defeated human top Go players Li Shishi, Ke Jie, etc.
 In the multi-agent collaboration Dota 2 game platform, OpenAI Five intelligent
programs developed by OpenAI defeated the TI8 champion team OG in a restricted
game environment, showing a large number of professional high-level intelligent
operations. Figure 1-9lists the major time points between 2006 and 2019 for AI
development.
 Timeline for deep learning development:
2. DEEP LEARNING CHARACTERISTICS
2.1 CHARACTERISTICS OF MODERN DEEP LEARNING ALGORITHMS:
Compared with traditional machine learning algorithms and shallow neural networks, modern
deep learning algorithms usually have the following characteristics.
Following are the characteristics of modern deep learning algorithms which make it better to
use:
 DATA VOLUME :
 Early machine learning algorithms are relatively simple and fast to train, and the size of
the required dataset is relatively small, such as the Iris flower dataset.
 With the development of computer technology, the designed algorithms are more and
more complex, and the demand for data volume is also increasing. With the rise of
neural networks, especially deep learning networks, the number of network layers and
model parameters are large.
 To prevent over fitting, the size of the training dataset is usually huge.Although deep
learning has a high demand for large datasets, collecting data, especially collecting
labeled data, is often very expensive.
 The formation of a dataset usually requires manual collection, crawling of raw data and
cleaning out invalid samples, and then annotating the data samples with human
intelligence, so subjective bias and random errors are inevitably introduced.

 COMPUTING POWER :
 The increase in computing power is an important factor in the third artificial intelligence
renaissance.
 The real potential of deep learning was not realized until the release of AlexNet.
 Traditional machine learning algorithms do not have stringent requirements on data
volume and computing power like deep learning. But deep learning relies heavily on
parallel acceleration computing devices.
 Most of the current neural networks use parallel acceleration chips such as NVIDIA
GPU and Google TPU to train model parameters.
 For example, the AlphaGo Zero program needs to be trained on 64 GPUs from scratch
for 40 days before surpassing all AlphaGo historical versions. At present, the deep
learning acceleration hardware devices that ordinary consumers can use are mainly from
NVIDIA GPU graphics cards.

 This shows the variation of one billion floating-point operations per second (GFLOPS)
of NVIDIA GPU and x86 CPU from 2008 to 2017. It can be seen that the curve of x86
CPU changes relatively slowly, and the floating-point computing capacity of the
NVIDIA GPU grows exponentially, which is mainly driven by the increasing business
of game and deep learning computing.

 NETWORK SCALE:
 Early perceptron models and multilayer neural networks only have one or two to four
layers, and the network parameters are also around tens of thousands.
 With the development of deep learning and the improvement of computing capabilities,
models such as AlexNet (8 layers), VGG16 (16 layers), GoogleNet (22 layers),
ResNet50 (50 layers), and DenseNet121 (121 layers) have been proposed successively,
while the size of inputting pictures has also gradually increased from 28×28 to 224×224
to 299×299 and even larger.
 The increase of network scale enhances the capacity of the neural networks
correspondingly, so that the networks can learn more complex data modalities and the
model performance can be improved accordingly.
 On the other hand, the increase of the network scale also means that we need more
training data and computational power to avoid over fitting.

 GENERAL INTELLIGENCE:
 Designing a universal intelligent mechanism that can automatically learn and self-adjust
like the human brain has always been the common vision of human beings.
 Deep learning is one of the algorithms closest to general intelligence. In the computer
vision field, previous methods that need to design features for specific tasks and add a
priori assumptions have been abandoned by deep learning algorithms.
 At present, almost all algorithms in image recognition, object detection, and semantic
segmentation are based on end-to-end deep learning models, which present good
performance and strong adaptability.
 On the Atari game platform, the DQN algorithm designed by DeepMind can reach
human equivalent level in 49 games under the same algorithm, model structure, and
hyperparameter settings, showing a certain degree of general intelligence.
 DQN network structure:

2.2 DEEP NEURAL NETWORKS VS SHALLOW NEURAL NETWORKS:


 The basic structure of a simple neural network in modern applications
consists of three layers: an input layer, a hidden layer (or middle layer),
and an output layer.
 The input layers consist of the number of attributes or values that an
input consists of. Example: if the input is a 20x20 pixel image, the input
layer will consist of 400 input nodes, that each represent a pixel.
 The middle layer consists of one or more hidden layers, which are
responsible for the majority of the transformations on the input data into
output signals, depending on their various synaptic weights and activation
function.
 The last layer, the output layer, combines all the signals or outputs from
the last hidden layer and performs a classification or output
transformation.
 A Shallow Neural Network is a simple neural network that consist of
only 1 or 2 hidden layers.
 A Deep Neural Network is a simple neural network that consist of
multiple hidden layers.
 Both models suffer from overfitting or poor generalization in many cases.
Deep networks include more hyper-parameters than shallow ones that
increase the overfitting probability.

 Differences between SNN and DNN


 In addition to the input and output layers, neural networks have an
intermediate layer, also known as a hidden layer.
 shallow networks have less hidden layers. Because of which there
will be a significant increase in the number of parameters when
dealing with complex functions.
 Deep Neural Networks have multiple hidden layers which helps to fit
complex function better with less parameters compared to a shallow
network.
 Shallow neural networks process the features directly, while deep
networks extract features automatically along with the training.
 The structure of shallow networks allows them to learn important features
independently from other features, which is best suited for learning tasks
dealing with data with low dimensionality and a relatively small number
of features.
 Deep neural networks excel in very complex tasks that have large input
data and high dimensionality.
 These networks have multiple hidden layers, with lower layers
learning low-level features and higher layers learning higher-level
features which are dependent on the patterns and knowledge derived
from the lower layers. The layer dependent factor of deep neural
networks means that learning is dependent from layer to layer and the
hidden layers reuse the learned features from lower layers to learn
more complex features and solve more complex tasks
 Deep learning works best when the training set is huge and feature
set is complex.
 Shallow network works best when data has low dimensionality and
relatively less important feature set.
 Deep neural networks excel in areas such as computer vision tasks,
speech recognition, and signals processing
 Shallow neural networks are widely used for simple regression tasks.
2.3 DEEP LEARNING AND ITS APPLICATIONS:

 Deep learning is a class of machine learning algorithms that uses


multiple layers to progressively extract higher-level features from the
raw input.
Example: In image processing, lower layers may identify edges,
while higher layers may identify the concepts relevant to a human
such as digits or letters or faces.
 A Deep learning process can learn which features to optimally place
in which level on its own.
 Each level learns to transform its input data into a slightly more
abstract and composite representation.
 This does not eliminate the need for hand-tuning; for example,
varying numbers of layers and layer sizes can provide different
degrees of abstraction.
Applications of Deep Learning
DAILY LIFE APPLICATIONS:
1. voice assistants in mobile phones.
2. intelligent assisted driving in cars.
MAIN STREAM APPLICATIONS:
1. Computer vision
2. NLP
3. Reinforcement learning
Computer Vision Applications
1. Image classification
2. Object detection
3. Semantic segmentation
4. Video Understanding
Natural Language Processing Applications
1. Machine Translation
2. Chatbot
Reinforcement Learning Applications
1. Virtual Games.
2. Robotics
3. Autonomous driving

Image classification
• The input of the neural network is pictures, and the output value is the
probability that the current sample belongs to each category.
• Generally, the category with the highest probability is selected as the
predicted category of the sample.
Chatbot
• Mainstream task of natural language processing.
• Machines automatically learn to talk to humans, provide satisfactory
automatic responses to simple human demands, and improve customer
service efficiency and service quality.
• Neural network is trained to generate appropriate responses for input
questions.
• Chatbot is often used in consulting systems, entertainment systems, and
smart homes.

Virtual Games
• virtual game platforms can both train and test reinforcement learning
algorithms
• Neural network trains and learns in a complex environment on the basis
of reward function.
• avoid interference from irrelevant factors while also minimizing the cost
of experiments

MAIN STREAM APPLICATIONS OF DEEP LEARNING:


Deep learning algorithms have been widely used in our daily life, such as voice
assistants in mobile phones, intelligent assisted driving in cars, and face
payments. We will introduce some mainstream applications of deep learning
starting with computer vision, natural language processing, and reinforcement
learning.

 Fraud detection:
 Fraud is a growing problem in the digital world. In 2021,
consumers reported 2.8 million cases of fraud to the Federal Trade
Commission. Identify theft and imposter scams were the two most
common fraud categories.
 To help prevent fraud, companies like Signifyd use deep
learning to detect anomalies in user transactions. Those companies
deploy deep learning to collect data from a variety of sources,
including the device location, length of stride and credit card
purchasing patterns to create a unique user profile.
 Mastercard has taken a similar approach, leveraging its Decision
Intelligence and AI Express platforms to more accurately detect
fraudulent credit card activity. And for companies that rely on e-
commerce, Riskified is making consumer finance easier by
reducing the number of bad orders and chargebacks for merchants.
Relevant companies: Neurala, ZeroEyes, Motional

 Computer Vision:
 Computer Vision Image classification is a common classification. The
input of the neural network is pictures, and the output value is the
probability that the current sample belongs to each category.
 Generally, the category with the highest probability is selected as the
predicted category of the sample.
 Image recognition is one of the earliest successful applications of deep
learning. Classic neural network models include VGG series, Inception
series, and ResNet series.
 Object detection refers to the automatic detection of the approximate
location of common objects in a picture by an algorithm. It is usually
represented by a bounding box and classifies the category information of
objects in the bounding box. Common object detection algorithms are
RCNN, Fast RCNN, Faster RCNN, Mask RCNN, SSD, and YOLO
series.
 Semantic segmentation is an algorithm to automatically segment and
identify the content in a picture. We can understand semantic
segmentation as the classification of each pixel and analyze the category
information of each pixel. Common semantic segmentation models
include FCN, U-net, SegNet, and DeepLab series.
 Video Understanding. As deep learning achieves better results on 2D
picture–related tasks, 3D video understanding tasks with temporal
dimension information (the third dimension is sequence of frames) are
receiving more and more attention. Common video understanding tasks
include video classification, behavior detection, and video subject
extraction. Common models are C3D, TSN, DOVF, and TS_LSTM.
 Image generation learns the distribution of real pictures and samples
from the learned distribution to obtain highly realistic generated pictures.
At present, common image generation models include VAE series and
GAN series. Among them, the GAN series of algorithms have made
great progress in recent years.

 Agriculture
 Agriculture will remain a key source of food production in the
coming years, so people have found ways to make the process
more efficient with deep learning and AI tools. AI to detect
intrusive wild animals, forecast crop yields and power self-driving
machinery.
 Blue River Technology has explored the possibilities of self-driven
farm products by combining machine learning, computer vision
and robotics. The results have been promising, leading to smart
machines — like a lettuce bot that knows how to single out weeds
for chemical spraying while leaving plants alone. In addition,
companies like Taranis blend computer vision and deep learning to
monitor fields and prevent crop loss due to weeds, insects and
other causes.
 Relevant Companies: Blue River Technology, Taranis

 Natural language processing


 The introduction of natural language processing technology has
made it possible for robots to read messages and divine meaning
from them. Still, the process can be somewhat
oversimplified, failing to account for the ways that words combine
together to change the meaning or intent behind a sentence.
 Deep learning enables natural language processors to identify more
complicated patterns in sentences to provide a more accurate
interpretation. Companies use deep learning to power a chatbot that
is able to respond to a larger volume of messages and provide more
accurate responses. Grammarly also uses deep learning in
combination with grammatical rules and patterns to help users
identify the errors and tone of their messages.
Relevant companies: Gamalon, Strong, Grammarly

 Autonomous vehicles
 Driving is all about taking in external factors like the cars around you,
street signs and pedestrians and reacting to them safely to get from point
A to B. While we’re still a ways away from fully autonomous vehicles,
deep learning has played a crucial role in helping the technology come to
fruition.
 It allows autonomous vehicles to take into account where you want to go,
predict what the obstacles in your environment will do and create a safe
path to get you to that location.
Relevant companies: Zoox, Tesla, Waymo

 Climate change
 Organizations are stepping up to help people adapt to quickly accelerating
environmental change. One Concern has emerged as a climate
intelligence leader, factoring environmental events such as extreme
weather into property risk assessments.
 Meanwhile, NCX has expanded the carbon-offset movement to include
smaller landowners by using AI technology to create an affordable carbon
marketplace.
 Entertainment
 Streaming platforms aggregate tons of data on what content you choose
to consume and what you ignore. Take Netflix as an example. The
streaming platform uses machine learning to find patterns in what its
viewers watch so that it can create a personalized experience for its users.
Relevant companies: Amazon, Netflix
VARIOUS DEEP LEARNING FRAMEWORKS:
1) Theano is one of the earliest deep learning frameworks. It was developed by
Yoshua Bengio and Ian Goodfellow. It is a Python-based computing library
for positioning low-level operations. Theano supports both GPU and CPU
operations. Due to Theano’s low development efficiency, long model
compilation time, and developers switching to TensorFlow, Theano has now
stopped maintenance.
2) Scikit-learn is a complete computing library for machine learning
algorithms. It has built-in support for common traditional machine learning
algorithms, and it has rich documentation and examples. However, scikit-
learn is not specifically designed for neural networks. It does not support
GPU acceleration, and the implementation of neural network–related layers
is also lacking.
3) Caffe was developed by Jia Yangqing in 2013. It is mainly used for
applications using convolutional neural networks and is not suitable for other
types of neural networks. Caffe’s main development language is C ++, and it
also provides interfaces for other languages such as Python. It also supports
GPU and CPU. Due to the earlier development time and higher visibility in
the industry, in 2017 Facebook launched an upgraded version of Caffe,
Caffe2. Caffe2 has now been integrated into the PyTorch library.
4) Torch is a very good scientific computing library, developed based on the
less popular programming language Lua. Torch is highly flexible, and it is
easy to implement a custom network layer, which is also an excellent gene
inherited by PyTorch. However, due to the small number of Lua language
users, Torch has been unable to obtain mainstream applications.
5) Keras is a high-level framework implemented based on the underlying
operations provided by frameworks such as Theano and TensorFlow. It
provides a large number of high-level interfaces for rapid training and
testing. For common applications, developing with Keras is very efficient.
But because there is no low-level implementation, the underlying framework
needs to be abstracted, so the operation efficiency is not high, and the
flexibility is average.
Keras can be understood as a set of high-level API design specifications.
Keras itself has an official implementation of the specifications. The same
specifications are also implemented in TensorFlow, which is called the
tf.keras module, and tf.keras will be used as the unique high-level interface
to avoid interface redundancy.
6) TensorFlow is a deep learning framework released by Google in 2015. The
initial version only supported symbolic programming. Thanks to its earlier
release and Google’s influence in the field of deep learning, TensorFlow
quickly became the most popular deep learning framework. However, due to
frequent changes in the interface design, redundant functional design, and
difficulty in symbolic programming development and debugging,
TensorFlow 1.x was once criticized by the industry. In 2019, Google
launched the official version of TensorFlow 2, which runs in dynamic graph
priority mode and can avoid many defects of the TensorFlow 1.x version.
TensorFlow 2 has been widely recognized by the industry.At present,
TensorFlow and PyTorch are the two most widely used deep learning
frameworks in industry. TensorFlow has a complete solution and user base
in the industry. Thanks to its streamlined and flexible interface design,
PyTorch can quickly build and debug networks, which have received rave
reviews in academia. After TensorFlow 2 was released, it makes it easier for
users to learn TensorFlow and seamlessly deploy models to production.
BIOLOGICAL NEURON VS BASIC NEURON MODEL:
Neurons are the building blocks of the nervous system. They receive and transmit signals to
different parts of the body. This is carried out in both physical and electrical forms.
 Structure of a biological Neuron:

 Biological Neural Network :


Biological Neural Network (BNN) is a structure that consists of Synapse, dendrites,
cell body, and axon. In this neural network, the processing is carried out by neurons.
Dendrites receive signals from other neurons, Soma sums all the incoming signals
and axon transmits the signals to other cells.

Artificial Neuron:
An artificial neuron is a connection point in an artificial neural network. Artificial neural
networks, like the human body's biological neural network, have a layered architecture and
each network node (connection point) has the capability to process input and forward output
to other nodes in the network.
 Structure of an Artificial Nueron:
 Artificial Neural Network:
Artificial Neural Network (ANN) is a type of neural network which is based on a
Feed-Forward strategy. It is called this because they pass information through the
nodes continuously till it reaches the output node. This is also known as the simplest
type of neural network.

NEURON MODEL:
• In 1943, the psychologist Warren McCulloch and mathematical logician Walter
Pitts proposed a mathematical model of artificial neural networks to simulate the
mechanism of biological neurons.

• This research was further developed by the American neurologist Frank Rosenblatt
into the perceptron model which is also the cornerstone of modern deep learning.

• The neuron input vector x = [x1, x2, x3, …, xn]T maps to y through function fθ : x →
y, where θ represents the parameters in the function f.

• Consider a simplified case, such as linear transformation: f (x) = wTx + b.

• The expanded form is

• The parameters θ = {w1, w2, w3, …, wn, b} determine the state of the neuron, and
the processing logic of this neuron can be determined by fixing those parameters.
3.1 REGRESSION : NEURON MODEL
Regression is a ML algorithm which is used to find the relationship between a
dependent variable and other independent variables.
1. If we consider a Simple Linear Regression ,the number of parameters are one.
2. When the number of input nodes n = 1 (single input), the neuron model can
be further simplified as
y =wx + b
w-slope of the straight line
b-bias of the straight line
3. Then we can plot the change of y as a function of x , which is in the form of a
straight line.
4. In order to estimate the value of w and b, we only need to sample any two
data points (x(1), y(1)) and (x(2), y(2)) from the straight line.

• If, we can solve the preceding equations to get


the value of w and b. Let’s consider a specific example:
• Solving these 2 linear equations we obtain the values of w=1.477 and
b=0.089
• If we consider other than simple linear regression model, the number of
data points may or may not be two.
• For linear neuron models with N input, we only need to sample N + 1
different data points.
REGRESSION NEURON MODEL CONTINUED..

• If we consider other than simple linear regression model, the number of


data points may or may not be two.
• For linear neuron models with N input, we only need to sample N + 1
different data points.

• Practically, there may be observation errors for any sampling point, we


assume that the observation error variable ϵ follows a normal distribution

• where μ is mean and σ2 is variance. Then the samples follow:

Effect of Estimation Bias:


• Once the observation error is introduced, even if it is as simple as a linear
model, if only two data points are sampled, it may bring a large
estimation bias

If the estimation is based on the two blue rectangular data points, the estimated
blue dotted line would have a larger deviation from the true orange straight line.
• In order to reduce the estimation bias introduced by observation errors,
we can sample multiple data points.
• We should find the best straight line, so that it minimizes the
sum of errors between all sampling points and the straight line.
• Due to the existence of observation errors, there may not be a
straight line that perfectly passes through all the sampling points
D.
• Therefore, we should be able to find a good straight line cloe to all
sampling points with the help of Mean Squared Error(MSE)

• Mean squared error (MSE) is calculated between the predicted


value wx(i) + b and the true value y(i) at all sampling points as
the total error, that is

• Then search a set of parameters w∗ and b∗ to minimize the total error ‘L’.
• The straight line corresponding to the minimal total error is the optimal
straight line we are looking for, that is

• Therefore,Optimal straight line equation is :

• Where n= number of samples


OPTIMISATION METHOD

• we need to find the optimal parameters ∗ and ∗


, so that the input and output meet a
linear relationship ( ) = w ( ) + b, i ∈ [1, n].

• For a single-input neuron model, only two samples are needed to obtain the exact solution
of the equations by the elimination method.

• This exact solution derived by a strict formula is called an analytical solution.

• What about in the case of multiple data points??

• We can only use numerical optimization methods to obtain an approximate numerical


solution

• Why is it called optimization?

• This is because the computer’s calculation speed is very fast. We can use the powerful
computing power to “search” and “try” multiple times, thereby reducing the error ‘L’ step by
step.

BRUTE-FORCE ALGORITHM:

• Simplest optimization method.

• It is a.k.a random experiment.

• To find the most suitable w∗ and b∗, we can randomly sample any w and b from the real
number space and calculate the error value ‘L’ of the corresponding model.

• Pick out the smallest error L∗ from all the experiments L , and its corresponding w∗ and b∗
are the optimal parameters we are looking for.

• ADVANTAGES: simple and straightforward

• DISADVANTAGES: extremely inefficient for large-scale, high-dimensional optimization


problems

GRADIENT DESCENT ALGORITHM:

• Most commonly used optimization algorithm in neural network training.

• Parallel acceleration capability of powerful graphics processing unit (GPU) chips.

• Very suitable for optimizing neural network models with massive data.

• It is also suitable for optimizing our simple linear neuron model.

• It is a Core algorithm of deep learning.

• Concept of derivative can be used to solve maximum and minimum values of a function.

• The stagnation point can be found by setting derivative to zero.


• The stagnation type can also be checked there after.

• Consider f (x) = .sin(x)

• In the interval xϵ[-10,10] f(x) can be drawn as “blue” solid lines and its derivative as “yellow”
dotted line

• The gradient of a function is defined as a vector of partial derivatives of the function on each
independent variable.

• Let ‘z’ be a 3 dimensional function with independent variables x and y:

z = f (x, y)

• Let ,

= partial derivative of the function w.r.t the independent variable x, and ,

= partial derivative of the function w.r.t the independent variable y

• The gradient , = [ , ] is a vector.

• Example: ( , ) = −( + )
• NOTE: The direction of the arrow always points to the direction where the function value
increases.The steeper the function surface, the longer the length of the arrow, and the
larger the modulus of the gradient.

CONCEPT OF GRADIENT DESCENT:

• We have observed that gradient direction of the function always points to the direction in
which the function value increases.

• The opposite direction of the gradient should point to the direction in which the function
value decreases.

= − .

We must iteratively update x’ using the above equation, then we can get smaller and smaller
function values.

 = − .
 Here is known as LEARNING RATE
 It is used to scale the gradient vector
 It is generally set a smaller value such as 0.01 or 0.001
NOTE: FOR ONE DIMENSIONAL FUNCTIONS, The above function can be written as :

= − .
 The method of optimizing parameters by the formula = − . is called the
gradient descent algorithm.
 It calculates the gradient ∇f of the function f .
 It iteratively updates the parameters θ to obtain the optimal numerical solution of the
parameters θ when the function f reaches its minimum value.
 NOTE: model input in deep learning is generally represented as x and the parameters
to be optimized are generally represented by θ, w, and b.
 Lets go back to the REGRESSION MODEL where optimization was needed in MSE.
1 ( ) ( )
= + −

 we will apply the gradient descent algorithm to calculate the optimal parameters ∗
and ∗ by minimising the Mean Square Error ‘L’.
 The model parameters that needs to be optimised are w and b , and they need to be
updated using = − and = −

FEATURES OF VANILLA,STOCHASTIC AND MINI BATCH
GRADIENT DECENT ALGORITHMS:

Vanilla Gradient Stochastic Gradient Mini Batch Gradient


Descent Descent Decent

Computes gradient Computes gradient Computes gradient


using the whole using a single Training using the Subset of
Training sample sample Training sample

Computation time is
Faster and less lesser than SGD
Slow and computationally Computation cost is
computationally expensive than Vanilla lesser than Vanilla
expensive algorithm GD Gradient Descent

Can be used for large Can be used for large


training samples. training samples and it
But it maybe slow is also faster than SGD.
Not suggested for huge when datasets are
training samples. huge.

Cost Function reduces Lot of variations in Smoother cost function


smoothly cost function as compared to SGD

Gives optimal solution Gives optimal solution


given sufficient time to Gives good solution in less time compared
converge. but not optimal. to SGD

The data sample The Data sample is


should be in a random Shuffled in a random
order, and this is why order and then divided
we want to shuffle the into batches.
No random shuffling training set for every
of points are required. epoch.

Mini Batch can escape


SGD can escape local minima easily
Can’t escape shallow shallow local minima compared to vanilla
local minima easily. more easily. Gradient Descent
Vanilla Gradient Stochastic Gradient Mini Batch Gradient
Descent Descent Decent

At times mini batch can


reach convergence
Reaches the faster than SGD.
convergence much
Convergence is slow. faster.

SGD is a basis of more Mini batch gradient


Vanilla Gradient advanced stochastic descent is most
descent is generally algorithms used in commonly used in
used for Small training artificial practical applications
databases that fit into neural networks
computer memory

 Mini-batch gradient descent is a variation of the gradient descent


algorithm that splits the training dataset into small batches that are used
to calculate model error and update model coefficients.
 Batch Gradient descent can prevent the noisiness of the gradient, but we
can get stuck in local minima and saddle points
 With stochastic gradient descent we have difficulties to settle on a
global minimum, but usually, don’t get stuck in local minima
 The mini-batch approach is the default method to implement the
gradient descent algorithm in Deep Learning. It combines all the
advantages of other methods, while not having their disadvantages.
Advantages of Mini-Batch Gradient Descent :
1. Computational Efficiency: In terms of computational efficiency, this
technique lies between the two previously introduced techniques.
2. Stable Convergence: Another advantage is the more stable converge
towards the global minimum since we calculate an average gradient over
n samples that results in less noise.
3. Faster Learning: As we perform weight updates more often than with
stochastic gradient descent, in this case, we achieve a much faster
learning process.

 Analogous to the batch gradient descent we compute and average the


gradients across the data instance in a mini-batch. The gradient descent
step is performed after each mini-batch of samples has been processed.
 Neither we use all the dataset all at once nor we use the single example
at a time. We use a batch of a fixed number of training examples which
is less than the actual dataset and call it a mini-batch.
 So, after creating the mini-batches of fixed size, we do the following
steps in one epoch:
1. Pick a mini-batch
2. Feed it to Neural Network

3. Calculate the mean gradient of the mini-batch


4. Use the mean gradient we calculated in step 3 to update the
weights
5. Repeat steps 1–4 for the mini-batches we created.

Vanilla Gradient Descent


 This is the simplest form of gradient descent technique. Here, vanilla
means pure / without any modification.
 Its main feature is that we take small steps in the direction of the
minima by taking gradient of the cost function.
 There are high chances of getting struck in local minima.
Therefore,Learning rate needs to be chosen very carefully.

Adam optimization algorithm


 The Adam optimization algorithm is an extension to stochastic
gradient descent that has recently seen broader adoption for deep
learning applications.
 Adam combines the best properties of the AdaGrad and RMSProp
algorithms to provide an optimization algorithm that can handle sparse
gradients on noisy problems
 Instead of adapting the parameter learning rates based on the average
first moment (the mean). Adam also makes use of the average of the
second moments of the gradients (the uncentered variance).
Main Advantage:
 Adam optimization’s main advantage over Vanilla Gradient Descent is
its capability of using both momentum and adaptive gradient.
 Momentum takes Adam beyond the local minimum where vanilla
gradient decent can get struck and then, adjustment from the sum of
gradient can give Adam proper direction to explore and eventually
finding the global minima.
SIMPLE PYTHON PROGRAM TO ACHIEVE A MODEL OUTPUT
Y=1.477*X+0.089+EPS (WITH RANDOM ERRORS)
This Python program implements a simple linear regression model with stochastic
gradient descent optimization to find the parameters w and b that minimize the
mean squared error (MSE) between the model's predictions and the actual data.

import numpy as np
data = [] # A list to save data samples
for i in range(100): # repeat 100 times
# Randomly sample x from a uniform distribution
x = np.random.uniform(-10., 10.)
# Randomly sample from Gaussian distribution
eps = np.random.normal(0., 0.01)
# Calculate model output with random errors
y = 1.477 * x + 0.089 + eps
data.append([x, y]) # save to data list
data = np.array(data) # convert to 2D Numpy array

def mse(b, w, points):


# Calculate MSE based on current w and b
totalError = 0
# Loop through all points
for i in range(0, len(points)):
x = points[i, 0] # Get ith input
y = points[i, 1] # Get ith output
# Calculate the total squared error
totalError += (y - (w * x + b)) ** 2
# Calculate the mean of the total squared error
return totalError / float(len(points))

def step_gradient(b_current, w_current, points, lr):


# Calculate gradient and update w and b.
b_gradient = 0
w_gradient = 0
M = float(len(points)) # total number of samples
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# dL/db:grad_b = 2(wx+b-y) from equation (2.3)
b_gradient += (2/M) * ((w_current * x + b_current) - y)
# dL/dw:grad_w = 2(wx+b-y)*x from equation (2.2)
w_gradient += (2/M) * x * ((w_current * x + b_current) - y)
# Update w',b' according to gradient descent algorithm
# lr is learning rate
new_b = b_current - (lr * b_gradient)
new_w = w_current - (lr * w_gradient)
return [new_b, new_w]

def gradient_descent(points, starting_b, starting_w, lr,num_iterations):


# Update w, b multiple times
b = starting_b # initial value for b
w = starting_w # initial value for w
# Iterate num_iterations time
for step in range(num_iterations):
# Update w, b once
b, w = step_gradient(b, w, np.array(points), lr)
# Calculate current loss
loss = mse(b, w, points)
if step%50 == 0: # print loss and w, b
print(f"iteration:{step}, loss:{loss}, w:{w}, b:{b}")

return [b, w] # return the final value of w and b


def main():
# Load training dataset
data = []
for i in range(100):
x = np.random.uniform(3., 12.)
# mean=0, std=0.1
eps = np.random.normal(0., 0.1)
y = 1.477 * x + 0.089 + eps
data.append([x, y])
data = np.array(data)
lr = 0.01 # learning rate
initial_b = 0 # initialize b
initial_w = 0 # initialize w
num_iterations = 150
# Train 150 times and return optimal w*,b* and corresponding loss
[b, w]= gradient_descent(data, initial_b, initial_w, lr,num_iterations)
loss = mse(b, w, data) # Calculate MSE
print(f'Final loss:{loss}, w:{w}, b:{b}')

Generating Data:

The program starts by importing the necessary libraries, particularly NumPy for numerical
operations.
It initializes an empty list data to store data samples.
It generates 100 data samples by iterating over a loop 100 times:
 Randomly samples x from a uniform distribution between -10 and 10.
 Randomly samples ε (error term) from a Gaussian distribution with mean 0 and standard
deviation 0.01.
 Calculates the model output y=1.477×x+0.089+ε with random errors.
 Appends the data sample [x, y] to the data list.
 Finally, it converts the data list into a 2D NumPy array.

Mean Squared Error (MSE) Function (mse):

Defines a function mse(b, w, points) to calculate the mean squared error given the current values of
w and b and the dataset. It iterates over all data points, calculates the squared error for each point,
and sums them up.Returns the mean of the squared errors.

Gradient Descent Step Function (step_gradient):

Defines a function step_gradient(b_current, w_current, points, lr) to perform one step of gradient
descent.Calculates the gradients for b and w using the partial derivatives of the loss function with
respect to b and w (which are derived from the mean squared error).Updates the values of b and w
using the gradients and the learning rate (lr).

Main Function:

Defines a main function to run the program.Generates a new dataset with 100 samples.Sets the
learning rate (lr), initial values for b and w, and the number of iterations.Calls gradient_descent to
train the model and obtain the optimal b and w values.Calculates the final loss using the optimal
parameters and prints the results.
CLASSIFICATION

• Aim: We will take 0–9 digital picture recognition as an example to explore how to use
machine learning to solve the classification problem.

• Dataset: MNIST data set.

• It is handwritten digital dataset , generally scaled to a fixed size, such as 28x28 pixels. For
simplicity grayscale information is retained.

• These pictures will be used as the input data x. Also the data is labelled in nature.

• MNIST data set contains real handwritten pictures of numbers 0–9. Each number has a total
of 7,000 pictures, collected from different writing styles.

• Out of it 60k images are used for training and 10k for testing.

NOTE:

• Generally, pixel values are integers ranging from 0 to 255 to express color intensity
information.

• For example, 0 represents the lowest intensity, and 255 indicates the highest intensity.

• If it is a color picture, each pixel contains the intensity information of the three channels R,
G, and B, which, respectively, represent the color intensity of colors red, green, and blue.
• Therefore, each pixel of a color picture is represented by a one-dimensional vector with
three elements, which represent the intensity of R, G, and B colors.

• As a result, a color image is saved as a tensor with dimension [h, w, 3].

• A grayscale picture only needs a two-dimensional matrix with shape [h, w] or a three-
dimensional tensor with shape [h, w, 1] to represent its information.

• The matrix content of a picture for number 8. It can be seen that the black pixels in the
picture are represented by 0 and the grayscale information is represented by 0–255.

• The whiter pixels in the picture correspond to the larger values in the matrix.

Steps to download, manage, and load the MNIST dataset


NOTE:

• We use a matrix of shape [h, w] to represent a picture.

• For multiple pictures, we can add one more dimension in front and use a tensor of shape [b,
h, w] to represent them.

• Here b represents the batch size.

• Color pictures can be represented by a tensor with the shape of [b, h, w, c], where c
represents the number of channels, which is 3 for color pictures.
CLASSIFICATION- BUILDING A MODEL

• We took ‘x’ as the the input. If it is single input scalar then the model can be expressed as
y=xw+b.

• If it is multi input single output model structure, then we use y= +

where, and

• By combining multiple multi-input, single-output neuron models, we can build a multi-input,


multi-output model.

y =Wx + b

• For multiple-output and batch training, we write the model in batch form:

Y = X@W +b

• din represents input dimension, and dout indicates output dimension.

• X has shape [b, din], b is the number of samples and din is the length of each sample

• W has shape [din, dout], containing din ∗ dout parameters.

• Bias vector b has shape dout.

• The @ symbol means matrix multiplication. Since the result of the operation X @ W is a
matrix of shape [b, dout], it cannot be directly added to the vector b.

• Therefore, the + sign in batch form needs to support broadcasting, that is, expand the vector
b into a matrix of shape [b, dout] by replicating b.

• Lets build a Neural network of the same with 3 inputs and 2 outputs.
• A grayscale image is stored using a matrix with shape [h, w], and b pictures are stored using
a tensor with shape [b, h, w].

• However, our model can only accept vectors, so we need to flatten the [h, w] matrix into a
vector of length [h ⋅ w]. Thus the length of the input features din = h ⋅ w.

• The output actually can be set to a set of vectors with length dout, where dout is the same
as the number of categories.

• For example, if the output belongs to the first category, then the corresponding index is set
to 1, and the other positions are set to 0.

• This encoding method is called one-hot encoding.


ERROR CALCULATION

• For classification problems, our goal is to maximize a certain performance metric, such as
accuracy.

• But when accuracy is used as a loss function, it is in fact indifferentiable.

• As a result, the gradient descent algorithm cannot be used to optimize the model
parameters.

• For the error calculation of a classification problem, it is more common to use the cross-
entropy loss function instead of the mean squared error loss function introduced in the
regression problem.

MAJOR ISSUES in Handwritten digital picture recognition problems are:

1. A linear model is not enough because:

– It is one of the simplest models in machine learning.

– It has only a few parameters

– It can only express linear relationships.

The perception and decision-making of complex brains are far more complex than a linear model.

2. Complexity:

• It is the model ability to approximate complex distributions.

• The preceding solution only uses a one-layer neural network model composed of a small
number of neurons.

• Compared with the 100 billion neuron interconnection structure in the human brain, its
generalization ability is obviously weaker.

Example of model complexity and data distribution:

• The distribution of sampling points with observation errors is plotted. The actual distribution
may be a quadratic parabolic model.

• If you use a linear model to fit the data, it is difficult to learn a good model.

• If you use a suitable polynomial function model to learn, such as a quadratic polynomial, you
can learn a suitable model.

• But when the model is too complex, such as a ten-degree polynomial, it is likely to overfit
and hurt the generalization ability of the model
So what is the solution????
NON LINEAR MODEL

• Since a linear model is not feasible, we can embed a nonlinear function in the linear model
and convert it to a nonlinear model.

• We call this nonlinear function the activation function, which is represented by σ:

o= σ(Wx+b)

Activation Function:

• Activation functions introduce non-linearities into the network, allowing it to learn


complex patterns in the data.

• Common activation functions are ReLU, Sigmoid, Tanh etc.

• The ReLU function only retains the positive part of function y = x and sets the negative part
to be zeros.

• It has a unilateral suppression characteristic. Although simple, the ReLU function has
excellent nonlinear characteristics, easy gradient calculation, and stable training process.

• It is one of the most widely used activation functions for deep learning models.

• we convert the model to a nonlinear model by embedding the ReLU function:

o = ReLU( Wx+ b )

Model Complexity:

• To increase the model complexity, we can repeatedly stack multiple transformations such
as:

ℎ = ( + )

ℎ = ( ℎ + )
O= ℎ +

In the preceding equations, we take:

• the output value h1 of the first-layer neuron as the input of the second-layer neuron .

• Then take the output h2 of the second-layer neuron as the input of the third-layer neuron.

• The output of the last-layer neuron is the model output.

• We call the layer where the input node x is located the input layer.

• The output of each nonlinear module hi along with its parameters Wi and bi is called a
network layer.

• In particular, the layer in the middle of the network is called the hidden layer, and the last
layer is called the output layer.

• This network structure formed by the connection of a large number of neurons is called a
neural network.

• The number of nodes in each layer and the number of layers determine the complexity of
the neural network.
OPTIMISATION METHOD:CLASSIFICATION

• Optimization methods similar to regression can also be used to solve classification


problems.

• For a network model with single layer :

• we can directly derive the partial derivative expression of and then calculate the
gradient for each step and update the parameters w and b using the gradient descent
algorithm.

• As complex nonlinear functions are embedded, the number of network layers and the length
of data features also increase.

• The model becomes very complicated, and it is difficult to manually derive the gradient
expressions.

• once the network structure changes, the model function and corresponding gradient
expressions also change.

• Therefore, it is obviously not feasible to rely on the manual calculation of the gradient.

SOLUTION TO THIS PROBLEM:

• Invention of deep learning frameworks.

• With the help of autodifferentiation technology, deep learning frameworks can build the
neural network’s computational graph during the calculation of each layer’s output
corresponding loss function and then automatically calculate the gradient of any
parameter .

• Users only need to set up the network structure, and the gradient will automatically be
calculated and updated, which is very convenient and efficient to use.
HANDS ON – CLASSIFICATION

• The first step is to build the network.

• For the first layer of the network the input is ∈ .

• The output of first layer serves as input to the first hidden layer. Lets say it is ℎ ∈ . So
the output vector is of dimension 256.

• Now we need not write ℎ =ReLU( + ).

• Since we are using tensorflow deep learning frame work we shall just write a single line
code:

layers.Dense(256, activation='relu')

• Next step is how do we build a multi layer network? For an example a three layer network ?

• Solution is using “sequential” function in tensor flow.

# Build a 3-layer network. The output of 1st layer is the inputof 2nd layer.

model = keras.Sequential([

layers.Dense(256, activation='relu'),

layers.Dense(128, activation='relu'),

layers.Dense(10)])

• After building the three-layer neural network, given the input x, we can call model(x) to get
the model output o and calculate the current loss L:

with tf.GradientTape() as tape:

# Record the gradient calculation

# Flatten x, [b, 28, 28] => [b, 784]

x = tf.reshape(x, (-1, 28*28))

# Step1. get output [b, 784] => [b, 10]

out = model(x)

# [b] => [b, 10]

y_onehot = tf.one_hot(y, depth=10)

# Calculate squared error, [b, 10]

loss = tf.square(out-y_onehot)

# Calculate the mean squared error, [b]


loss = tf.reduce_sum(loss) / x.shape[0]

• Then we use the autodifferentiation function from TensorFlow


tape.gradient(loss,model.trainable_variables) to calculate all the gradients where ϵ{ ,
, , , , }

grads = tape.gradient(loss,model.trainable_variables)

• Then weuse the optimizer object to automatically update the model parameters ϴ

# Auto gradient calculation

grads = tape.gradient(loss, model.trainable_variables)

# w' = w - lr * grad, update parameters

optimizer.apply_gradients(zip(grads, model.trainable_

variables))

• After multiple iterations, the learned model f θ can be used to predictthe categorical
probability of unknown pictures.

• Because the three-layer neural network has relatively strong generalization ability and the
task of handwritten digital picture recognition is relatively simple, the training error
decreases quickly.

• Iterating all training samples once is called one epoch.

• We can test the model’s accuracy and other indicators after several epochs to monitor the
model training effect.

You might also like