What is Deep Learning?
Deep learning is a class of machine learning algorithms that:
● Use a cascade of multiple layers of nonlinear processing units for feature
extraction and transformation. Each successive layer uses the output from the
previous layer as input.
● Learn in supervised (e.g., classification) and/or unsupervised (e.g., pattern
analysis) manners.
● Learn multiple levels of representations that correspond to different levels of
abstraction; the levels form a hierarchy of concepts.
Why Deep Learning?
● Limitations of traditional machine learning algorithms
○ not good at handling high dimensional data.
○ difficult to do feature extraction and object recognition.
● Advantages of deep learning
○ DL is computationally expensive, but it is capable of handling high
dimensional data.
○ feature extraction is done automatically.
Types of Neural Network
● Artificial Neural Network
● Convolutional Neural Network
● Recurrent Neural Network
● Generative adversarial networks
ANN
● ANN possess a large number of processing elements called nodes/neurons
which operate in parallel.
● Neurons are connected with others by connection link.
● Each link is associated with weights which contain information about the
input signal.
CNN
•The Convolutional Neural Networks or CNNs are primarily used for tasks
related to computer vision or image processing.
• CNNs are extremely good in modeling spatial data such as 2D or 3D images
and videos.
• They can extract features and patterns within an image, enabling tasks such
as image classification or object detection.
How CNNs Work
The operation of CNNs can be summarized in several key steps:
1. Convolution Operation: Filters slide over the input image to produce feature maps that capture various
aspects of the image.
2. Activation: Non-linear activation functions like ReLU are applied to introduce complexity into the model.
3. Pooling: Spatial dimensions are reduced through pooling layers, which help retain important features while
simplifying computations.
4. Hierarchical Feature Learning: As data passes through multiple layers, CNNs learn increasingly complex
features—from simple edges in early layers to intricate shapes and objects in deeper layers.
5. Classification: The final fully connected layer processes these features to classify or predict outcomes
based on learned patterns.
RNN
● The Recurrent Neural Networks or RNN are primarily used to model
sequential data, such as text, audio, or any type of data that represents
sequence or time.
● They are often used in tasks related to natural language processing (NLP)
● RNNs have memory that helps them use information from past sequences.
● RNNs are designed to handle input sequences of variable length.
● RNNs take an input and combine it with its internal state to produce an output
and update its internal state.
Benefits of RNNs
● RNNs are effective at capturing dependencies in sequential data, which
improves prediction accuracy.
● RNNs are capable of recognizing patterns over time, making them ideal for
tasks like speech and handwriting recognition.
GAN
● A machine learning model that trains two neural networks to compete against
each other to generate new data
● The goal of the generator is to create data that looks real, while the
discriminator’s goal is to identify fake data
● GANs are used to generate new images, music, and more
● Ian Goodfellow and his colleagues developed the concept in 2014
● Generative adversarial networks or GANs are frameworks that are used for
the tasks related to unsupervised learning.
● This type of network essentially learns the structure of the data, and patterns
in a way that it can be used to generate new examples, similar to that of the
original dataset.
OBJECTIVE FUNCTIONS
•They provide a clear target for the optimization algorithm, allowing it to adjust the
model's internal parameters to minimize the difference between predictions and
ground truth.
1. Mean Absolute Error
● In Regression problems, the intuition is to reduce the difference between the
actual data points and the predicted regression line.
● The magnitude of errors are measured without the directions. Though it is a
simple objective function but there is a lack of robustness and stability in this
function.
● Also known as the L1 loss, its value ranges from 0 to infinity.
2.MSE
● Similar to the mean absolute error, instead of taking the absolute value, it
squares the difference between the actual and the predicted data points.
● The squaring is done to highlight those points which are farther away
from the regression line.
● Mean Squared Error is also known as the cost function in regression
problems and the goal is to reduce the cost function to its global optimum
in order to get the best fit line to the data.
3.CROSS ENTROPY
● In Binary classification problem where the labels are either 0 or 1, the Cross
Entropy loss function is used. The multiclass cross entropy however is used in
case of multi-classification problem.
● Between two probability functions, the divergence is measured by the cross
entropy function.
● Between two distributions, the difference would be large if the cross entropy
is large but they are same when the difference is small.
● The learning speed is fast when the difference is large and slow when the
difference is small. Chances of reaching the global optimum is more in case
of the cross entropy loss function because of its fast convergence
4.HINGE
● For training classifiers, the loss function which is used is known as the Hinge
loss which follows the maximum-margin objective.
● The output of the predicted function in this case should be raw.
● The sign of the actual output data point and the predicted output would be
same. The loss would be equal to zero when the predicted output is greater
than 1.
● The loss increases linearly with the actual output data is the sign is not equal.
In Support Vector Machines it is used mostly.
OPTIMIZATION ALGORITHMS
A deep learning model comprises multiple layers of interconnected neurons
organized into layers. Each neuron computes an activation function on the
incoming data and passes the result to the next layer. The activation functions
introduce non-linearity, allowing for complex mappings between inputs and
outputs.
Gradient Descent?
● Gradient Descent is an algorithm designed to minimize a function by
iteratively moving towards the minimum value of the function.The hiker starts
at a random location and can only feel the slope of the ground beneath their
feet. To reach the valley’s lowest point, the hiker takes steps in the direction of
the steepest descent.
● Gradient Descent aims to find a function’s parameters (weights) that minimize
the cost function. In the case of a deep learning model, the cost function is the
average of the loss for all training samples as given by the loss function.
● While the loss function is a function of the model’s output and the ground
truth, the cost function is a function of the model’s weights and biases.
Stochastic Gradient Descent (SGD)
● Stochastic Gradient Descent (SGD) is a variant of the traditional Gradient
Descent optimization algorithm that introduces randomness into the
optimization process to improve convergence speed and potentially escape
local minima.
● like Gradient Descent, the primary goal of SGD is to minimize the cost
function of a model by iteratively adjusting its parameters (weights). However,
SGD aims to achieve this goal more efficiently by using only a single training
example at a time to inform the update of the model’s parameters.
AdaGrad (Adaptive Gradient Algorithm)
● Introduces an innovative twist to the conventional Gradient Descent
optimization technique by dynamically adapting the learning rate, allowing for
an effective optimization process.
● AdaGrad aims to fine-tune the model’s parameters to minimize the cost
function, similar to Gradient Descent.
● Its distinctive feature is individually adjusting learning rates for each parameter
based on the historical gradient information for those parameters.
● This leads to more aggressive learning rate adjustments for weights tied to rare
but important features, ensuring these parameters are optimized adequately
when their respective features play a role in predictions.
RMSprop (Root Mean Square Propagation)
● RMSprop is an adaptive learning rate optimization algorithm designed to address
AdaGrad’s diminishing learning rates issue.
● RMSprop, like its predecessors, aims to optimize the model’s parameters to
minimize the cost function. Its key innovation lies in adjusting the learning rate for
each parameter using a moving average of recent squared gradients, ensuring
efficient and stable convergence.
Adam (Adaptive Moment Estimation)
● Adam combines the best properties of AdaGrad and RMSprop to provide an
optimization algorithm that can handle sparse gradients on noisy problems.
● Adam seeks to optimize the model’s parameters to minimize the cost
function, utilizing adaptive learning rates for each parameter. It uniquely
combines momentum (keeping track of past gradients) and scaling the
learning rate based on the second moments of the gradients, making it
effective for a wide range of problems.
LEARNING ALGORITHM
SUPERVISED LEARNING
● Supervised learning is a category of machine learning that uses labeled datasets to train
algorithms to predict outcomes and recognize patterns. Unlike unsupervised learning,
supervised learning algorithms are given labeled training to learn the relationship between
the input and the outputs.
● Supervised machine learning algorithms make it easier for organizations to create complex
models that can make accurate predictions. As a result, they are widely used across various
industries and fields, including healthcare, marketing, financial services, and more.
Types of supervised learning
Classification
Classification algorithms are used to group data by predicting a categorical label or output
variable based on the input data. Classification is used when output variables are categorical,
meaning there are two or more classes.
One of the most common examples of classification algorithms in use is the spam filter in your
email inbox. Here, a supervised learning model is trained to predict whether an email is spam or
not with a dataset that contains labeled examples of both spam and legitimate emails. The
algorithm extracts information about each email, including the sender, the subject line, body
copy, and more. It then uses these features and corresponding output labels to learn patterns
and assign a score that indicates whether an email is real or spam.
CLASSIFICATION
REGRESSION
Regression algorithms are used to predict a real or continuous value, where the
algorithm detects a relationship between two or more variables.
A common example of a regression task might be predicting a salary based on
work experience. For instance, a supervised learning algorithm would be fed
inputs related to work experience (e.g., length of time, the industry or field,
location, etc.) and the corresponding assigned salary amount. After the model is
trained, it could be used to predict the average salary based on work experience.
REGRESSION
TAGGING
In deep learning, "tagging" refers to the process of assigning relevant labels or
keywords to data (like images, text, or audio) using a trained model, allowing the
system to automatically categorize and identify key features within the data,
making it easier to search, organize, and analyze information; essentially, it's a
way for a deep learning model to describe the content of a piece of data by
attaching meaningful tags to it.
Key points about tagging in deep learning:
● Data annotation:
Before training a tagging model, data needs to be manually labeled with
relevant tags, providing the model with examples of what to look for.
● Feature extraction:
The model analyzes the data (e.g., pixels in an image, words in a sentence)
to extract features that are indicative of specific tags.
● Classification:
Based on the extracted features, the model predicts the most likely tags to
associate with the data.
Applications of tagging in deep learning:
● Image tagging: Automatically generating tags for images based on objects,
scenes, and people detected within them.
● Video tagging: Identifying key moments in a video and assigning relevant
tags to them.
● Text tagging: Extracting keywords or topics from a text document.
● Product tagging: Assigning attributes like color, size, and material to online
product listings.
How it works:
● Training data: A large dataset of labeled data is used to train the model.
● Model architecture: Convolutional Neural Networks (CNNs) are often used
for image tagging, while Recurrent Neural Networks (RNNs) can be effective
for text tagging.
● Feature extraction: The model learns to extract features from the data that
are most relevant for identifying the desired tags.
● Prediction: Once trained, the model can then predict tags for new, unseen
data.
Benefits of tagging with deep learning:
● Efficiency: Automatically tagging large amounts of data saves time and effort
compared to manual tagging.
● Accuracy: Deep learning models can learn complex patterns in data, leading
to more accurate tag suggestions.
● Scalability: Can be applied to a wide range of data types and large datasets.
Web Search
"Web search in deep learning" refers to the application of deep learning
techniques to improve the accuracy and relevance of web search results, where
complex neural networks are used to understand the semantic meaning of search
queries and documents, going beyond simple keyword matching to deliver more
contextually relevant results; essentially, it's a way to make search engines
"smarter" by leveraging deep learning capabilities to interpret user intent better.
Key points about web search in deep learning:
● Semantic Understanding:
Deep learning models can analyze the meaning of words within a query,
considering context and relationships between terms, allowing for better
matching with relevant documents even if the exact keywords aren't present.
● Vector Representations:
Text is transformed into numerical vectors, where similar words are
positioned close together in a multidimensional space, enabling the search
engine to identify semantically related content.
● Neural Search:
This is the term used for a search engine that heavily relies on deep learning
models to process queries and rank documents.
How it works:
● Training Data:
Large datasets of labeled search queries and relevant documents are used
to train the deep learning model.
● Embedding Creation:
Both the search query and documents are converted into numerical vectors
using a neural network, representing their semantic meaning.
● Similarity Calculation:
The search engine compares the query vector to the vectors of indexed
documents, returning the most similar ones as the top results.
Benefits of using deep learning in web search:
● Improved Relevance: Can identify relevant results even when the query
uses ambiguous or complex language.
● Contextual Understanding: Takes into account the context of a query, not
just keywords.
● Personalized Search: Can tailor results based on user history and
preferences.
● Image and Video Search Enhancement: Deep learning models can analyze
visual content to provide more accurate image and video search results.
Examples of deep learning techniques used in web search:
● Word Embeddings: Representing words as vectors to capture semantic
relationships.
● Recurrent Neural Networks (RNNs): Useful for understanding sequential
information in text.
● Transformer Models: Advanced neural network architecture that excels at
capturing long-range dependencies in text.
Page Ranking
"Page ranking in deep learning" refers to the application of deep learning
techniques to enhance or improve the traditional PageRank algorithm, which is
used to determine the importance of web pages based on their link structure,
essentially allowing for more nuanced and context-aware page ranking by
leveraging the power of deep learning models to analyze content and user
behavior beyond just links.
Key points about Page ranking in deep learning:
● Beyond link analysis:
While the classic PageRank algorithm primarily relies on hyperlinks between
pages, deep learning models can analyze the actual content of pages, user
interactions, and other contextual factors to provide a more accurate ranking
based on relevance and quality.
● Semantic understanding:
Deep learning models like recurrent neural networks (RNNs) or transformers
can understand the semantic meaning of content, allowing them to identify
pages with relevant information even if they don't have many direct links
pointing to them.
● Personalized ranking:
By incorporating user data and behavior patterns into the deep learning
model, it's possible to generate personalized page rankings that cater to
individual user preferences.
● Dynamic updates:
Unlike the traditional PageRank algorithm which might update rankings less
frequently, deep learning models can be used to continuously monitor and
adjust page rankings based on real-time user interactions and content
updates.
Potential applications of deep learning in PageRank:
● Search engine optimization:
By understanding the factors that contribute to a higher ranking based on
deep learning analysis, website owners can optimize their content to improve
their search engine ranking.
● Recommendation systems:
Deep learning-based page ranking can be utilized to recommend relevant
content to users based on their past behavior and interests.
● Content quality assessment:
Deep learning models can be used to identify high-quality content that might
not be well-linked but is still valuable to users.
Challenges and limitations:
● Data requirements:
Training effective deep learning models often requires large amounts of
labeled data which can be challenging to obtain.
● Model complexity:
Complex deep learning models can be computationally expensive to run,
especially for large-scale web search applications.
● Overfitting risk:
Careful design and validation are required to prevent the model from
overfitting to training data and not generalizing well to new content
Recommender systems
A recommender system in deep learning (DL) uses neural networks to analyze user
data and predict what content they will enjoy. DL recommender systems can
process large amounts of data and non-linear data, which makes them more
effective than traditional recommender systems.
How it works
● Feature extraction: DL techniques use neural networks to extract features
from data
● Embedding: DL techniques use embeddings to represent entities as vectors of
numbers
● Sequence modeling: DL techniques use recurrent neural networks (RNNs) or
transformer-based architectures to process sequences
Benefits
● Personalization: DL recommender systems can personalize
recommendations for individual users
● Diverse data: DL recommender systems can handle diverse data formats.
● Non-linear data: DL recommender systems can process non-linear data,
which traditional recommender systems cannot.
Types of DL recommender systems
● Content-based: Recommends items similar to those that a user has rated
highly.
● Collaborative filtering: Recommends items based on similarities between
users and/or items.
● Hybrid: Combines content-based and collaborative filtering approaches
Sequential deep learning is a technique that uses deep learning to process data
that is sequential in nature. This data can be in the form of text, speech, video, or
other types of data.
What is sequential data?
● Sequential data is data that is dependent on other data points in the same
dataset.
● Examples of sequential data include time series, DNA sequences, and
meteorological data.
What are some examples of sequential deep learning tasks?
● Stock price forecasting: Predicting future stock prices based on historical data
● Text mining: Analyzing text data to extract meaning
● Sentiment analysis: Classifying the sentiment of text data
● Machine translation: Translating text from one language to another
● Voice recognition: Recognizing speech
● Natural language understanding: Understanding the meaning of natural language
● DNA sequence analysis: Analyzing DNA sequences
What are some deep learning models that can be used for sequential learning?
● Recurrent neural networks: Can remember events that happened in the
past.
● LSTMs: A type of recurrent neural network that can store information for many
time steps.
● Sequential CNN models: A linear stack of layers that can be used for image
data
sequence learning
Sequential deep learning is a technique that uses deep learning to process data
that is sequential in nature. This data can be in the form of text, speech, video, or
other types of data.
What is sequential data?
● Sequential data is data that is dependent on other data points in the same
dataset.
● Examples of sequential data include time series, DNA sequences, and
meteorological data.
What are some examples of sequential deep learning tasks?
● Stock price forecasting: Predicting future stock prices based on historical
data
● Text mining: Analyzing text data to extract meaning
● Sentiment analysis: Classifying the sentiment of text data
● Machine translation: Translating text from one language to another
● Voice recognition: Recognizing speech
● Natural language understanding: Understanding the meaning of natural
language
● DNA sequence analysis: Analyzing DNA sequences
UNSUPERVISED LEARNING
Unsupervised learning in artificial intelligence is a type of machine learning that learns from
data without human supervision. Unlike supervised learning, unsupervised machine learning
models are given unlabeled data and allowed to discover patterns and insights without any
explicit guidance or instruction.
Whether you realize it or not, artificial intelligence and machine learning are impacting every
aspect of daily life, helping to turn data into insights that can improve efficiencies, reduce
costs, and better inform decision-making. Today, businesses are using machine learning
algorithms to help power personalized recommendations, real-time translations, or even
automatically generate text, images, and other types of content.
Here, we’ll cover the basics of unsupervised machine learning, how it works, and some of its
common real-life applications.
Clustering
Clustering is a technique for exploring raw, unlabeled data and breaking it down into
groups (or clusters) based on similarities or differences. It is used in a variety of
applications, including customer segmentation, fraud detection, and image analysis.
Clustering algorithms split data into natural groups by finding similar structures or
patterns in uncategorized data.
Clustering is one of the most popular unsupervised machine learning approaches.
There are several types of unsupervised learning algorithms that are used for
clustering, which include exclusive, overlapping, hierarchical, and probabilistic.
● Exclusive clustering: Data is grouped in a way where a single data point can
only exist in one cluster. This is also referred to as “hard” clustering. A common
example of exclusive clustering is the K-means clustering algorithm, which
partitions data points into a user-defined number K of clusters.
● Overlapping clustering: Data is grouped in a way where a single data point
can exist in two or more clusters with different degrees of membership. This is
also referred to as “soft” clustering.
● Hierarchical clustering: Data is divided into distinct clusters based on
similarities, which are then repeatedly merged and organized based on their
hierarchical relationships. There are two main types of hierarchical clustering:
agglomerative and divisive clustering. This method is also referred to as
HAC—hierarchical cluster analysis.
● Probabilistic clustering: Data is grouped into clusters based on the
probability of each data point belonging to each cluster. This approach differs
from the other methods, which group data points based on their similarities to
others in a cluster.
Reinforcement learning(RL)
Unlike supervised learning, which relies on a training dataset with predefined
answers, RL involves learning through experience. In RL, an agent learns to
achieve a goal in an uncertain, potentially complex environment by performing
actions and receiving feedback through rewards or penalties.
● Input: The input should be an initial state from which the model will start
● Output: There are many possible outputs as there are a variety of solutions to
a particular problem
● Training: The training is based upon the input, The model will return a state
and the user will decide to reward or punish the model based on its output.
● The model keeps continues to learn.
● The best solution is decided based on the maximum reward.
OVERFITTING AND UNDERFITTING
To evaluate how well a model learns and generalizes, we monitor its performance on both
the training data and a separate validation or test dataset which is often measured by its
accuracy or prediction errors. However, achieving this balance can be challenging. Two
common issues that affect a model’s performance and generalization ability are overfitting
and underfitting. These problems are major contributors to poor performance in machine
learning models.
Reasons for Underfitting:
1. The model is too simple, So it may be not capable to represent the complexities in the
data.
2. The input features which is used to train the model is not the adequate representations
of underlying factors influencing the target variable.
3. The size of the training dataset used is not enough.
4. Excessive regularization are used to prevent the overfitting, which constraint the model
to capture the data well.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.
Techniques to Reduce Overfitting
● Improving the quality of training data reduces overfitting by focusing on meaningful patterns,
mitigate the risk of fitting the noise or irrelevant features.
● Increase the training data can improve the model’s ability to generalize to unseen data and reduce
the likelihood of overfitting.
● Reduce model complexity.
● Early stopping during the training phase (have an eye over the loss over the training period as soon
as loss begins to increase stop training).
● Ridge Regularization and Lasso Regularization.
● Use dropout for neural networks to tackle overfitting.
Hyperparameters and Validation Sets
Hyperparams control ML Behavior
• Most ML algorithms have hyperparameters
– We can use to control algorithm behavior
– Values of hyperparameters are not adapted by learning algorithm itself.
• Although, we can design nested learning where one learning algorithm
– Which learns best hyperparameters for another learning algorithm
Validation Set
To solve the problem we use a validation set
– Examples that training algorithm does not observe
• Test examples should not be used to make choices about the model hyperparameters
• Training data is split into two disjoint parts
– First to learn the parameters
– Other is the validation set to estimate generalization error during or after training
• allowing for the hyperparameters to be updated
– Typically 80% of training data for training and 20% for validation