Video Database
The aim of a video database system is to provide capabilities for storing, retrieving, and
presenting video information in ways that are comparable with the performance of traditional
databases. When video databases were first developed retrieval was based on sequential and
time-consuming searches through a whole video. The large size of video objects provides a
formidable challenge to database systems. Even when compressed a one-hour video using
MPEG compression can require gigabytes of storage. In order to achieve the performance
capabilities of traditional DBMS, we need to deploy technologies derived from not only
databases but disciplines such as image processing, pattern recognition, data security,
networking, and human-computer interaction. Videos also differ from other media and they are
far more complex. Few videos consist purely of video data. A typical video will have a
soundtrack containing music, speech, and other sounds, text appearing in the video sequence,
and possibly closed-caption text used to provide subtitles for the hard of hearing.
The field of video data management has advanced rapidly in the last decade. For example, the
automatic identification and separation of whole scenes from a video is now possible. Another
advance is the ability to automatically extract short video clips representing the key features of
much longer sequences, providing users with far more information than still frames could
provide. Perhaps even more useful is the query-by-motion example provided by the experimental
VideoQ system. This allows users to specify the way an object moves across the screen during a
video clip, as well as its color or shape.
Video sequences are an increasingly important form of media data and pose special challenges to
database designers and implementers because of their storage and retrieval requirements. Video
images are complex and contain a wide range of primitive image types as well as motion vectors.
Video objects can take hours to review, while the comparable process for still images takes
seconds at most. Therefore the process of video retrieval will contain aspects akin to the
abstracting and indexing of long text documents as well as aspects encountered in image
retrieval.
Role of Video Feature Extraction
Video can be processed to extract audiovisual features such as:
● image-based features;
● motion-based features;
● object detection and tracking;
● speech recognition;
● speaker identification;
● word spotting;
● audio classification.
Querying Video Libraries
Visit this link:
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3457&rep=rep1&type=pdf
Video Object Segmentation
Segmenting and tracking the objects of interest in the video is critical for effectively analyzing
and using video big data. Computer vision requires two basic tasks – segmenting and tracking
video objects. The object segmentation mask is made by dividing the pixels in the video frame
into two groups: foreground target and background region. In behavior recognition and video
retrieval, this is the crux of the problem.
Object tracking determines the exact location of the target in the video image and generates the
object bounding box, which is required for intelligent monitoring, big data video analysis, and
other applications. The segmentation and tracking of video objects appear to be separate issues,
but they are actually intertwined.
That is, solving one problem usually necessitates solving another problem, either implicitly or
explicitly. Clearly, solving the object segmentation problem makes solving the object tracking
problem simple. On the one hand, accurate segmentation results provide reliable object
observations for tracking, which can help solve issues like occult tracking.
Accurate object tracking results, on the other hand, can be used to guide the segmentation
algorithm in determining the object position, reducing the impact of object fast movement,
complex backgrounds, similar objects, and other factors, and improving object segmentation
performance.
Many studies have found that processing object segmentation and tracking problems at the same
time can help overcome their respective difficulties and improve performance. Video object
segmentation (VOS) and video object tracking are two major tasks that are related to each other
(VOT).
Methods of Video Object Segmentation
Video object segmentation and tracking methods are divided into two categories in this section:
unsupervised and semi-supervised video object segmentation methods. Let’s look at each one
individually.
1) Unsupervised Video Object Segmentation
During the test period, unsupervised methods assume no human input on the video. They want to
extract the most important Spatio-temporal object tube by grouping pixels that are consistent in
appearance and motion. They assume the objects to be segmented and tracked have different
motions or appear frequently in the sequence of images in general.
Early video segmentation techniques were primarily geometric in nature, and they were limited
to specific motion backgrounds. The traditional background subtraction method simulates the
appearance of each pixel’s background while treating rapidly changing pixels as foreground. A
moving object is represented by any significant change in the image and background model. The
pixels that make up the changed region are flagged to be processed further.
The connected region corresponding to the object is estimated using a connected component
algorithm. As a result, the process described above is known as background subtraction. Video
object segmentation is accomplished by creating a background model of the scene and then
looking for deviations from the model for each input frame.
2) Semi-Supervised Video Object Segmentation
Semi-supervised methods begin with human input, such as a pixel-accurate mask, clicks, or
scribbles, and then propagate the information to subsequent frames. The use of superpixels, the
creation of graphical models, the use of object proposals, and the use of optical flow and long-
term trajectories are all highlighted in existing approaches.
The architecture of these methods is typically based on semantic segmentation networks, and
each video frame is processed individually. Spatio-temporal graph and CNN-based semi-
supervised VOS are the two main categories in which they can be studied.
I. Spatio-Temporal Graph
Early methods solved some Spatio-temporal graphs with hand-crafted feature representation,
including appearance, boundary, and optical flows, and propagated the foreground region
throughout the video in recent years. Object representation of graph structure and Spatio-
temporal connections are typically used in these methods.
The task is typically formulated as a spatiotemporal label propagation problem, and these
methods approach the problem by constructing graph structures over the object representation of
I pixels, (ii) superpixels, or (iii) object patches to infer the labels for subsequent frames.
II. Convolutional Neural Network
With the success of convolutional neural networks for static image segmentation, CNN-based
methods for video object segmentation show overwhelming power. Motion-based and detection-
based techniques for temporal motion information can be categorized into two categories.
In general, motion-based methods make use of the temporal coherence of object motion to
formulate the problem of mask propagation from the first frame or a given annotated frame to
subsequent frames.
Without using temporal information, some methods learn an appearance model to perform pixel-
level detection and segmentation of the object at each frame. To fine-tune a deep network, they
rely on the first frame annotation of a given test sequence.
Video Standards
Video standard is a standard for video display adapters developed so that software developers
can anticipate how their programs will appear on the screen. Industry groups define video
standards, and they specify maximum screen resolution, which is measured by the number of
pixels that can be displayed horizontally and the number of lines that can be displayed vertically;
for example, the VGA standard defines a horizontal resolution of 640 pixels by 480 lines (640 X
480). See color depth, eXtended Graphics Array (XGA), HDTV, MCGA, resolution, SVGA,
SXGA, VGA.
Different video standards characterize the resolution and colors for presentations. Backing for an
illustration standard is resolved both by the screen and by the video adapter. The screen must
most likely demonstrate the resolution and colors characterized by the standard, and the video
adapter must be fit for transmitting the proper sign to the screen.
Recorded here, in the rough request of expanding force and complexity, are the more well-
known video standards for PCs. Note that a significant number of these numbers speak to just the
essentials indicated in the standards. Numerous providers of video adapters give more prominent
resolution and more colors.
● VGA – 640 x 480 resolution
● SVGA – 800 x 600 resolution
● XGA – 1024 x 768 resolution
● SXGA – 1280 x 1024 resolution
● UXGA – 1600 x 1200 resolution
Video display standards also specify the adapter’s maximum color depth, which is determined by
the bit length of the information used to represent distinct colors; a color depth of 8 bits yields
256 colors, a depth of 16 bits yields 65,536 colors, and a depth of 24 bits yields 16,777,216
colors. Although the various video standards originally defined additional electronic
characteristics of video adapters, such as video interlacing, they are now used to indicate an
adapter or display’s maximum resolution.
Common Uses of Video Standard
● The MPEG group created a video standard that specifies the coded bitstream for high-
quality digital video.
● The audio-video standard of the AVS Video decoder reference program was developed
in China.
● Video Standard resolution can determine the image quality produced by your PC screen.
Common Misuses of Video Standard
● Video standard is only suitable for playing mp3 files.