DD2423 Image Analysis and Computer Vision
IMAGE FORMATION
Mårten Björkman
Computational Vision and Active Perception
School of Computer Science and Communication
November 8, 2013
1
Image formation
Goal: Model the image formation process
• Image acquisition
• Perspective projection
– properties
– approximations
• Homogeneous coordinates
• Sampling
• Image warping
2
Image formation
Image formation is a physical process that captures scene illumination
through a lens system and relates the measured energy to a signal.
3
Basic concepts
• Irradiance E: Amount of light falling on a surface, in power per unit area (watts
per square meter). If surface tilts away from light, same amount of light strikes
bigger surface (foreshortening → less irradiance).
• Radiance L: Amount of light radiated from a surface, in power per unit area
per unit solid angle. Informally “Brightness”.
• Image irradiance E is proportional to scene radiance
4
Light source examples
Left: Forest image (left): sun behind observer, (right): sun opposite observer
Right: Field with rough surface (left): sun behind observer, (right): sun opposite observer.
5
Digital imaging
Image irradiance E × area × exposure time → Intensity
• Sensors read the light intensity that may be filtered through color filters, and
digital memory devices store the digital image information either as RGB color
space or as raw data.
• An image is discretized: sampled on a discrete 2D grid → array of color values.
6
Imaging acqusition - From world point to pixel
• World points are projected onto a camera sensor chip.
• Camera sensors sample the irradiance to compute energy values.
• Positions in camera coordinates (in mm) are converted to image coordinates
(in pixels) based on the intrinsic parameters of the camera:
- size of each sensor element,
- aspect ratio of the sensor (xsize/ysize),
- number of sensor elements in total,
- image center of sensor chip relative to the lens system.
7
Steps in a typical image processing system
• Image acquisition: capturing visual data by a vision sensor
• Discretization/digitalization - Quantization - Compression: Convert data into
discrete form; compress for efficient storage/transmission
• Image enhancement: Improving image quality (low contrast, blur noise)
• Image segmentation: Partition image into objects or constituent parts.
• Feature detection: Extracting pertinent features from an image that are
important for differentiating one class of objects from another.
• Image representation: Assigning labels to an object based on information
provided by descriptors.
• Image interpretation: Assigning meaning to image information.
8
Pinhole camera or “Camera Obscura”
9
Pinhole camera and perspective projection
• A mapping from a three dimensionsal (3D) world onto a two dimensional (2D)
plane in the previous example is called perspective projection.
• A pinhole camera is the simplest imaging device which captures the geometry
of perspective projection.
• Rays of light enter the camera through an infinitesimally small aperture.
• The intersection of light rays with the image plane form the image of the object.
10
Perspective projection
11
Pinhole camera - Perspective geometry
Y
X
lane
ep
ag
im
optical
center
Z
optical axis
p=(x,y,f) P=(X,Y,Z)
image coordinates world coordinates
focal length
• The image plane is usually modeled in front of the optical center.
• The coordinate systems in the world and in the image domain are
parallel. The optical axis is ⊥ image plane.
12
Lenses
• Purpose: gather light from from larger opening (aperture)
• Problem: only light rays from points on the focal plane intersect the
same point on the image plane
• Result: blurring in-front or behind the focal plane
• Focal depth: the range of distances with acceptable blurring
13
Imaging geometry - Basic camera models
• Perspective projection (general camera model)
All visual rays converge to a common point - the focal point
• Orthographic projection (approximation: distant objects, center of view)
All visual rays are perpendicular to the image plane
Image Image
focal
point
Perspective projection Orthographic projection
14
Projection equations
Y
y
f
Z
• Perspective mapping
x X y Y
= , =
f Z f Z
• Orthographic projection
x = X, y = Y
• Scaled orthography - Z0 constant (representative depth)
x X y Y
= , =
f Z0 f Z0
15
Perspective transformation
• A perspective transformation has three components:
- Rotation - from world to camera coordinate system
- Translation - from world to camera coordinate system
- Perspective projection - from camera to image coordinates
• Basic properties which are preserved:
- lines project to lines,
- collinear features remain collinear,
- tangencies,
- intersections.
16
Perspective transformation (cont)
parallel
lines
vanishing
point
image
camera
centre
Each set of parallel lines meet at a different vanishing point - vanishing point as-
sociated to this direction. Sets of parallel lines on the same plane lead to collinear
vanishing points - the line is called the horizon for that plane.
17
Homogeneous coordinates
• Model points (X,Y, Z) in R 3 world by (kX, kY, kZ, k) where k is arbitrary 6= 0,
and points (x, y) in R 2 image domain by (cx, cy, c) where c is arbitrary 6= 0.
• Equivalence relation: (k1X, k1Y, k1Z, k1) is same as (k2X, k2Y, k2Z, k2).
• Homogeneous coordinates imply that we regard all points on a ray (cx, cy, c) as
equivalent (if we only know the image projection, we do not know the depth).
• Possible to represent “points in infinity” with homogreneous coordinates
(X,Y, Z, 0) - intersections of parallel lines.
18
Computing vanishing points
19
Homogeneous coordinates (cont)
In homogeneous coordinates the projection equations can be written
kX
cx f 0 0 0 f kX
cy = 0 f 0 0 kY = f kY
kZ
c 0 0 1 0 kZ
k
Image coordinates obtained by normalizing the third component to one
(divide by c = kZ).
xc f kX X yc f kY Y
x= = = f , y= = =f
c kZ Z c kZ Z
20
Transformations in homogeneous coordinates
• Translation
X X ∆X
Y → Y + ∆Y
Z Z ∆Z
X 1 0 0 ∆X X
Y 0 1 0 ∆Y
Y
→
Z 0 0 1 ∆Z Z
1 0 0 0 1 1
• Scaling
X SX 0 0 0 X
Y 0 SY 0 0 Y
→
Z 0 0 SZ 0 Z
1 0 0 0 1 1
21
Transformations in homogeneous coordinates II
• Rotation around the Z axis
X cos θ − sin θ 0 0 X
Y sin θ cos θ 0 0 Y
→
Z 0 0 1 0 Z
1 0 0 0 1 1
• Mirroring in the XY plane
X 1 0 0 0 X
Y 0 1 0 0 Y
→
Z 0 0 −1 0 Z
1 0 0 0 1 1
22
Transformations in homogeneous coordinates III
Common case: Rigid body transformations (Euclidean)
0
X X ∆X
Y 0 → R Y + ∆Y
Z0 Z ∆Z
where R is a rotation matrix (R−1 = RT ) is written
0
X ∆X X
0
Y R ∆Y Y
0 =
Z ∆Z Z
1 0 0 0 1 1
23
Perspective projection - Extrinsic parameters
Consider world coordinates (X 0 ,Y 0 , Z 0 , 1) expressed in a coordinate system
not aligned with the camera coordinate system
0 0
X ∆X X X
Y R 0 0
∆Y Y 0 = A Y 0
=
Z ∆Z Z Z
1 0 0 0 1 1 1
Perspective projection (more general later)
0 0
X X X
x f 0 0 0 Y 0 Y 0
Y
c y = 0 f 0 0 = PA 0 = M
Z0
Z Z
1 0 0 1 0
1 1 1
24
Intrinsic camera parameters
Due to imperfect placement of the camera chip relative to the lens system,
there is always a small relative rotation and shift of center position.
25
Intrinsic camera parameters
A more general projection matrix allows:
• Image coordinates with an offset origin
• Non-square pixels
• Skewed coordinate axes
• Five variables below are known as the camera’s intrinsic parameters
fu γ u0 fu γ u0 0
K = 0 fv v0 , P = K 0 = 0 fv v0 0
0 0 1 0 0 1 0
Most important is the focal length ( fu , fv ). Normally, fu and fv are assumed
equal and the parameters γ, uo and vo close to zero.
26
Example: Perspective mapping
27
Example: Perspective mapping in stereo
28
Mosaicing
29
Exercise
Assume you have a point at (3m, −2m, 8m) with respect to the cameras coordinate
system. What is the image coordinates, if the image has a size (w, h) = (640, 480)
and origin in the upper-left corner, and the focal length is f = 480?
30
Exercise
Assume you have a point at (3m, −2m, 8m) with respect to the cameras coordinate
system. What is the image coordinates, if the image has a size (w, h) = (640, 480)
and origin in the upper-left corner, and the focal length is f = 480?
Answer:
X w
x= f + = (480 ∗ 3/8 + 640/2) = 500
Z 2
Y h
y = f + = (−480 ∗ 2/8 + 480/2) = 120
Z 2
31
Approximation: affine camera
32
Approximation: affine camera
• A linear approximation of perspective projection
X
x m11 m12 m13 m14
y = m21 m22 m23 m24 Y
Z
1 0 0 0 1
1
• Basic properties
– linear transformation (no need to divide at the end)
– parallel lines in 3D mapped to parallel lines in 2D
Angles are not preserved!
33
Planar Affine Transformation
Original Flipped x-size
Shifted and scaled Sheared
34
Summary of models
m11 m12 m13 m14
Projective (11 degrees of freedom): M = m21 m22 m23 m24
m31 m32 m33 m34
m11 m12 m13 m14
Affine (8 degrees of freedom): M = m21 m22 m23 m24
0 0 0 1
r11 r12 r13 ∆X
Scaled orthographic (6 degrees of freedom): M = r21 r22 r23 ∆Y
0 0 0 Z0
r11 r12 r13 ∆X
Orthographic (5 degrees of freedom): M = r21 r22 r23 ∆Y
0 0 0 1
All these are just approximations, since they all assume a pin-hole, which is
supposed to be infinitesimally small.
35
Sampling and quantization
• Sample the continuous signal at a finite set of points and quantize
the registered values into a finite number of levels.
• Sampling distances ∆x, ∆y and ∆t determine how rapid spatial and
temporal variations can be captured.
36
Sampling and quantization
• Sampling due to limited spatial and temporal resolution.
• Quantization due to limited intensity resolution.
37
Factors that affect quality
• Quantization: Assigning, usually integer, values to pixels (sampling an
amplitude of a function).
• Quantization error: Difference between the real value and assigned one.
• Saturation: When the physical value moves outside the allocated range,
then it is represented by the end of range value.
38
Different image resolutions
39
Different number of bits per pixel
40
Image warping
Resample image f (x, y) to get a new image g(u, v), using a coordinate trans-
formation: u = u(x, y), v = v(x, y).
Examples of transformations:
41
Image Warping
• For each grid point in (u, v) domain compute corresponding (x, y) values.
Note: transformation is inverted to avoid holes in result.
• Create g(u, v) by sampling from f (x, y) either by:
– Nearest neighbour look-up (noisy result)
– Bilinear interpolation (blurry result)
f (x + s, y + t) = (1 − t) · ((1 − s) · f (x, y) + s · f (x + 1, y)) +
+ t · ((1 − s) · f (x, y + 1) + s · f (x + 1, y + 1))
42
Nearest Neighbor vs. Bilinear Interpolation
43
Summary of good questions
• What parameters affects the quality in the acquisition process?
• What is a pinhole camera model?
• What is the difference between intrinsic and extrinsic camera parameters?
• How does a 3D point get projected to a pixel with a perspective projection?
• What are homogeneous coordinates and what are they good for?
• What is a vanishing point and how do you find it?
• What is an affine camera model?
• What is sampling and quantization?
44
Readings
• Gonzalez and Woods: Chapter 2
• Szeliski: Chapters 2.1 and 2.3.1
45