Video and Motion Estimation
Le Thanh Ha, Dr.
Laboratory of
Human Machine Interface
MOTION ESTIMATION
4/19/2025 Lê Thanh Hà 2
Temporal dependency
Visual data change very little between two consecutive frames
By removing this dependencies, the entropy of residual data to
be encoded significantly reduced:
Motion estimation/compensation
Multiframe reference
4/19/2025 Lê Thanh Hà 3
Temporal dependency
The difference between values colocated pixels of two consecutive
frames are very small.
Causes of the differences are noises and visual motions of objects in
the scene
4/19/2025 Lê Thanh Hà 4
Predictive coding using motion
Visual motions of an object make changes in a sequences of
frames.
4/19/2025 Lê Thanh Hà 5
Predictive coding using motion
Motion between frames must be estimated and modelled in
terms of motion vectors (Motion field).
The difference between frames is calculated by deploying
estimated motion vectors.
Only the calculated differences and motion vectors are
coded and signalled to decoder.
4/19/2025 Lê Thanh Hà 6
2D Motion vs. Optical Flow
True 2D motion
There is 3D motion between
object and camera
It is projected into 2D
imaging plane
What kind of motion?
4/19/2025 Lê Thanh Hà 8
2D Motion vs. Optical Flow
Optical flow
observed, perceived, apparent 2D motion based on changes in pixel
luminance
It also depends on illumination and object surface texture
It may not represent true 2D motion
Optical Flow Equation
Given only video sequence without any other information (such as illumination condition), we cannot
estimate true 2D motion.
The best one can hope to estimate is optical flow
Constant intensity assumption → optical flow equation
spatial optical
gradient flow
.
Ambiguities in Optical Flow Estimation
Optical flow equation only constrains the motion
vector in the gradient direction (vn)
The flow vector in the tangent direction (vt) is
under-determined
We can only determine the displacement that is orthogonal to
the edges
In regions with constant brightness (=0), the flow
is indeterminate
Optical flow estimation is unreliable in regions with flat
texture and more reliable near edges
The aperture problem
Perceived motion
The aperture problem
Actual motion
The barber pole illusion
http://en.wikipedia.org/wiki/Barberpole_illusion
The barber pole illusion
http://en.wikipedia.org/wiki/Barberpole_illusion
General Considerations for Motion Estimation
Two categories of approaches
Feature based
Correspondence between edges, points, etc
Object tracking, 3D reconstruction from 2D
Focus in Computer vision
Intensity based
optical flow estimation based on constant intensity assumption
Motion compensated prediction and filtering
Focus in Video coding
Three important questions
How to represent the motion field?
Which cost function (criterion) to use to estimate motion parameters?
Which optimization technique?
Motion Representation
Notations
Block Matching Algorithm (BMA)
Overview
Assume all pixels in a block experience the same translation, represented by a single MV
Estimate the MV for each block independently, by minimizing DFD (displaced frame difference) error over this block
Minimizing function
MAD: mean absolute difference (p=1)
MSE: mean square error (p=2)
Optimization method
Exhaustive search
Fast search algorithms
Exhaustive Block Matching Algorithm (EBMA)
Block Matching Algorithm (BMA)
4/19/2025 Lê Thanh Hà 21
Block Matching Algorithm (BMA)
4/19/2025 Lê Thanh Hà 22
VIDEO ANALYSIS
4/19/2025 Lê Thanh Hà 23
Coded video
• Video packet
Bit stream • Buffer activities
Compression • Motion fields
• Residuals
parameter • Prediction modes, QP
• Pixels
Picture • Frames
• Picture
4/19/2025 Lê Thanh Hà 24
Video analysis (Context)
• Narrative and Storytelling: Analyzing the plot, characters, themes, and overall story of the video.
• Message and Tone: Identifying the intended message, perspective, and emotional tone conveyed by
the video.
• Context and Audience: Understanding the historical, cultural, and social context surrounding the
video and who the intended audience is.
• Purpose and Function: Determining the reason for the video's creation (e.g., entertainment,
education, advocacy) and its intended function.
• Media Literacy: Evaluating the video's credibility, bias, and potential impact on viewers.
• Social and Cultural Impact: Analyzing the video's influence on social norms, attitudes, and
behaviors.
• Reaction Videos: Studying how individuals react to and interpret the content of other videos.
• Explanatory and Instructional Videos: Analyzing the effectiveness of the video in explaining a
concept or demonstrating a process.
4/19/2025 Lê Thanh Hà 25
Video analysis (technique)
• Motion Analysis: Studying the movement of objects or individuals in the video (e.g., in sports, film, or
surveillance).
• Object Detection and Tracking: Identifying and tracking specific objects within the video frames (e.g., cars,
people, animals).
• Activity and Behavior Analysis: Recognizing and classifying different activities and behaviors performed by
subjects in the video.
• Facial Recognition: Identifying and analyzing facial expressions and characteristics within the video.
• Scene Change Detection: Identifying transitions between different scenes or locations in the video.
• Anomaly Detection: Identifying unusual or unexpected events or objects within the video.
• Optical Character Recognition (OCR): Extracting text from the video frames.
• Video Summarization: Creating concise summaries of key events or information within the video.
• Image and Video Classification: Categorizing the content of individual frames or entire videos based on pre-
defined categories.
4/19/2025 Lê Thanh Hà 26