Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
6 views4 pages

Deep Learning

Deep learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Deep Learning

Deep learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Deep learning

Deep learning employs artificial neural networks with many layers—hence “deep”—rather than
the explicitly designed algorithms of traditional machine learning. Though neural networks were
introduced early in the history of machine learning, it wasn’t until the late 2000s and early 2010s,
enabled in part by advancements in GPUs, that they became dominant in most subfields of AI.

Loosely inspired by the human brain, neural networks comprise interconnected layers of
“neurons” (or nodes), each of which performs its own mathematical operation (called an
“activation function”). The output of each node’s activation function serves as input to each of
the nodes of the following layer and so on until the final layer, where the network’s final output
is computed. Crucially, the activation functions performed at each node are nonlinear, enabling
neural networks to model complex patterns and dependencies.

Each connection between two neurons is assigned a unique weight: a multiplier that increases or
decreases one neuron’s contribution to a neuron in the following layer. These weights, along
with bias terms between layers, are the parameters to be optimized through machine learning.

The backpropagation algorithm enables the computation of how each individual node contributes
to the overall output of the loss function, allowing even millions or billions of model weights to
be individually optimized through gradient descent algorithms. Because of the volume and
granularity of updates required to achieve optimal results, deep learning requires very large
amounts of data and computational resources compared to traditional ML.

That distributed structure affords deep learning models their incredible power and versatility.
Imagine training data as data points scattered on a 2-dimensional graph. Essentially, traditional
machine learning aims to find a single curve that runs through every one of those data points;
deep learning pieces together an arbitrary number of smaller, individually adjustable lines to
form the desired shape. Neural networks are universal approximators: it has been theoretically
proven that for any function, there exists a neural network arrangement that can reproduce it.

Having said that, just because something is theoretically possible doesn’t mean it’s practically
achievable through existing training methods. For many years, adequate performance on certain
tasks remained out of reach even for deep learning models—but over time, modifications to the
standard neural network architecture have unlocked new capabilities for ML models.

Convolutional neural networks (CNNs)

Convolutional neural networks (CNNs) add convolutional layers to neural networks. In


mathematics, a convolution is an operation where one function modifies (or convolves) the shape
of another. In CNNs, convolutional layers are used to extract important features from data
by applying weighted “filters”. CNNs are primarily associated with computer vision models and
image data, but have a number of other important use cases.

Recurrent neural networks (RNNs)

Recurrent neural networks (RNNs) are designed to work on sequential data. Whereas
conventional feedforward neural networks map a single input to a single output, RNNs map
a sequence of inputs to an output by operating in a recurrent loop in which the output for a given
step in the input sequence serves as input to the computation for the following step. In effect this
creates an internal “memory,” called the hidden state, that allows RNNs to understand context
and order.

Transformers

Transformer models, first introduced in 2017, are largely responsible for the advent of LLMs and
other pillars of generative AI, achieving state-of-the-art results across most subdomains of
machine learning. Like RNNs, transformers are ostensibly designed for sequential data, but
clever workarounds have enabled most data modalities to be processed by transformers. The
unique strength of transformer models comes from their innovative attention mechanism, which
enables the models to selectively focus on the parts of the input data most relevant at a specific
moment in a sequence.

Mamba models

Mamba models are a relatively new neural network architecture, first introduced in 2023, based
on a unique variation of state space models (SSMs). Like transformers, Mamba models provide
an innovative means of selectively prioritizing the most relevant information at a given moment.
Mamba has recently emerged as a rival to the transformer architecture, particularly for LLMs.

Machine learning use cases


Most applications of machine learning fall into one or more of the following categories, which
are defined primarily by their use cases and the data modalities they operate upon.

Computer vision

Computer vision is the subdomain of AI concerned with image data, video data other data
modalities that require a model or machine to “see,” from healthcare diagnostics to facial
recognition to self-driving cars. Notable subfields of computer vision include image
classification, object detection, image segmentation and optical character recognition (OCR).
Natural language processing (NLP)

The field of natural language processing (NLP) spans a diverse array of tasks concerning text,
speech and other language data. Notable subdomains of NLP include chatbots, speech
recognition, language translation, sentiment analysis, text generation, summarization and AI
agents. In modern NLP, large language models continue to advance the state of the art at an
unprecedented pace.

Time series analysis

Time series models are applied anomaly detection, market analysis and related pattern
recognition or prediction tasks. They use machine learning on historical data for a variety of
forecasting use cases.

Image generation

Diffusion models, variational autoencoders (VAEs) and generative adversarial networks


(GANs) can be used to generate original images that apply pixel patterns learned from training
data.

Machine learning operations (MLOps)

Machine learning operations (MLOps) is a set of practices for implementing an assembly line
approach to building, deploying and maintaining machine learning models.

Careful curation and preprocessing of training data, as well as appropriate model selection, are
crucial steps in the MLOps pipeline. Thoughtful post-training validation, from the design of
benchmark datasets to the prioritization of particular performance metrics, is necessary to ensure
that a model generalizes well (and isn’t just overfitting the training data).

Following deployment, models must be monitored for model drift, inference efficiency issues
and other adverse developments. A well-defined practice of model governance is essential to
continued efficacy, especially in regulated or fast-changing industries.

Machine learning libraries


A number of open source tools, libraries and frameworks exist for building, training and testing
machine learning projects. While such libraries offer an array of pre-configured modules and
abstractions to streamline the process of building ML-based models and workflows, practitioners
will need to familiarize themselves with commonly used programming languages—
particularly Python—to make full use of them.
Prominent open source libraries, particularly for building deep learning models,
include PyTorch, TensorFlow, Keras and the Hugging Face Transformers library.

Notable open source machine learning libraries and toolkits focused on traditional ML include
Pandas, Scikit-learn, XGBoost, Matplotlib, SciPy and NumPy among many others.

IBM itself maintains and updates a significant library of tutorials for beginners and advanced ML
practitioners alike.

You might also like