Structured outputs,
Data types
Mr. Sivadasan E T
Associate Professor
Vidya Academy of Science and Technology, Thrissur
Structured outputs
• A "structured object" in the context of
convolutional neural networks (CNNs) refers to
outputs that go beyond simple classification or
regression values.
• These outputs have complex, meaningful
relationships between their components and
typically represent high-dimensional data with
intricate patterns or structures.
Structured outputs
Convolutional networks can be used to output a high-
dimensional, structured object, rather than just
predicting a class label for a classification task or a
real value for a regression task.
High-Dimensional Tensor Output:
CNNs often emit a tensor as output.
A tensor can be seen as a multi-dimensional grid of
numbers representing probabilities, pixel intensities, or
other information.
Structured outputs
Example - Pixel-Level Classification:
Suppose a CNN produces a tensor S where:
Si,j,k represents the probability that pixel (j, k) belongs
to class i (like "car" or "person").
This enables pixel-wise classification rather than
predicting just a single class for the entire image.
Structured outputs
Image Segmentation:
By assigning a class to each pixel, CNNs can create
precise masks that outline individual objects in an
image.
Use Case: Identifying and isolating cars, roads, and
pedestrians in autonomous driving images.
Structured outputs
• Once a prediction for each pixel is made,
various methods can be used to further process
these predictions in order to obtain a
segmentation of the image into regions.
Structured outputs
• The general idea is to assume that large groups
of contiguous pixels tend to be associated with
the same label.
• Graphical models can describe the probabilistic
relationships between neighboring pixels.
Data Types
The data used with a convolutional network usually
consists of several channels.
Each channel being the observation of a different
quantity at some point in space or time.
Data Types
• One advantage to convolutional networks is that
they can also process inputs with varying spatial
extents.
• These kinds of input simply cannot be represented
by traditional, matrix multiplication-based neural
networks.
• This provides a compelling reason to use
convolutional networks even when computational
cost and overfitting are not significant issues.
Data Types
• For example, consider a collection of images,
where each image has a different width and
height.
• It is unclear how to model such inputs with a
weight matrix of fixed size.
Data Types
• Convolution is straightforward to apply; the
kernel is simply applied a different number of
times depending on the size of the input, and the
output of the convolution operation scales
accordingly.
Data Types
1-D Single Channel
• Audio waveform: The axis we convolve over
corresponds to time.
• We discretize time and measure the amplitude
of the waveform once per time step.
Data Types
1-D Multi-Channel
• This involves animating 3D characters by
changing their joint angles over time.
• Each frame records the angles of different
joints, describing the character's pose.
• In convolutional models, each data channel
represents the angle of one joint around a
specific axis.
Data Types
2-D Single Channel:
• Audio data that has been preprocessed with a
Fourier transform:
• We can transform the audio waveform into a 2D
tensor with different rows corresponding to different
frequencies and different columns corresponding to
different points in time.
Data Types
2-D Multi-Channel:
Color image data:
• One channel contains the red pixels, one the green
pixels, and one the blue pixels.
• The convolution kernel moves over both the
horizontal and vertical axes of the image, conferring
translation equivariance in both directions.
Data Types
3-D Single Channel:
Volumetric data: A common source of this kind of data
is medical imaging technology, such as CT scans.
Data Types
3-D Multi-Channel:
Color video data: One axis corresponds
to time, one to the height of the video frame, and one
to the width of the video frame.
Thank You!