180 Computer Vision: Algorithms and Applications, 2nd ed. (final draft, Sept.
2021)
3.8 Exercises
Ex 3.1: Color balance. Write a simple application to change the color balance of an image
by multiplying each color value by a different user-specified constant. If you want to get
fancy, you can make this application interactive, with sliders.
1. Do you get different results if you take out the gamma transformation before or after
doing the multiplication? Why or why not?
2. Take the same picture with your digital camera using different color balance settings
(most cameras control the color balance from one of the menus). Can you recover what
the color balance ratios are between the different settings? You may need to put your
camera on a tripod and align the images manually or automatically to make this work.
Alternatively, use a color checker chart (Figure 10.3b), as discussed in Sections 2.3 and
10.1.1.
3. Can you think of any reason why you might want to perform a color twist (Sec-
tion 3.1.2) on the images? See also Exercise 2.8 for some related ideas.
Ex 3.2: Demosaicing. If you have access to the RAW image for the camera, perform the
demosaicing yourself (Section 10.3.1). If not, just subsample an RGB image in a Bayer
mosaic pattern. Instead of just bilinear interpolation, try one of the more advanced techniques
described in Section 10.3.1. Compare your result to the one produced by the camera. Does
your camera perform a simple linear mapping between RAW values and the color-balanced
values in a JPEG? Some high-end cameras have a RAW+JPEG mode, which makes this
comparison much easier.
Ex 3.3: Compositing and reflections. Section 3.1.3 describes the process of compositing
an alpha-matted image on top of another. Answer the following questions and optionally
validate them experimentally:
1. Most captured images have gamma correction applied to them. Does this invalidate the
basic compositing equation (3.8); if so, how should it be fixed?
2. The additive (pure reflection) model may have limitations. What happens if the glass is
tinted, especially to a non-gray hue? How about if the glass is dirty or smudged? How
could you model wavy glass or other kinds of refractive objects?
3.8 Exercises 181
Ex 3.4: Blue screen matting. Set up a blue or green background, e.g., by buying a large
piece of colored posterboard. Take a picture of the empty background, and then of the back-
ground with a new object in front of it. Pull the matte using the difference between each
colored pixel and its assumed corresponding background pixel, using one of the techniques
described in Section 3.1.3 or by Smith and Blinn (1996).
Ex 3.5: Difference keying. Implement a difference keying algorithm (see Section 3.1.3)
(Toyama, Krumm et al. 1999), consisting of the following steps:
1. Compute the mean and variance (or median and robust variance) at each pixel in an
“empty” video sequence.
2. For each new frame, classify each pixel as foreground or background (set the back-
ground pixels to RGBA=0).
3. (Optional) Compute the alpha channel and composite over a new background.
4. (Optional) Clean up the image using morphology (Section 3.3.1), label the connected
components (Section 3.3.3), compute their centroids, and track them from frame to
frame. Use this to build a “people counter”.
Ex 3.6: Photo effects. Write a variety of photo enhancement or effects filters: contrast,
solarization (quantization), etc. Which ones are useful (perform sensible corrections) and
which ones are more creative (create unusual images)?
Ex 3.7: Histogram equalization. Compute the gray level (luminance) histogram for an im-
age and equalize it so that the tones look better (and the image is less sensitive to exposure
settings). You may want to use the following steps:
1. Convert the color image to luminance (Section 3.1.2).
2. Compute the histogram, the cumulative distribution, and the compensation transfer
function (Section 3.1.4).
3. (Optional) Try to increase the “punch” in the image by ensuring that a certain fraction
of pixels (say, 5%) are mapped to pure black and white.
4. (Optional) Limit the local gain f 0 (I) in the transfer function. One way to do this is to
limit f (I) < γI or f 0 (I) < γ while performing the accumulation (3.9), keeping any
unaccumulated values “in reserve”. (I’ll let you figure out the exact details.)
182 Computer Vision: Algorithms and Applications, 2nd ed. (final draft, Sept. 2021)
5. Compensate the luminance channel through the lookup table and re-generate the color
image using color ratios (2.117).
6. (Optional) Color values that are clipped in the original image, i.e., have one or more
saturated color channels, may appear unnatural when remapped to a non-clipped value.
Extend your algorithm to handle this case in some useful way.
Ex 3.8: Local histogram equalization. Compute the gray level (luminance) histograms for
each patch, but add to vertices based on distance (a spline).
1. Build on Exercise 3.7 (luminance computation).
2. Distribute values (counts) to adjacent vertices (bilinear).
3. Convert to CDF (look-up functions).
4. (Optional) Use low-pass filtering of CDFs.
5. Interpolate adjacent CDFs for final lookup.
Ex 3.9: Padding for neighborhood operations. Write down the formulas for computing
the padded pixel values f˜(i, j) as a function of the original pixel values f (k, l) and the image
width and height (M, N ) for each of the padding modes shown in Figure 3.13. For example,
for replication (clamping),
k = max(0, min(M − 1, i)),
f˜(i, j) = f (k, l),
l = max(0, min(N − 1, j)),
(Hint: you may want to use the min, max, mod, and absolute value operators in addition to
the regular arithmetic operators.)
• Describe in more detail the advantages and disadvantages of these various modes.
• (Optional) Check what your graphics card does by drawing a texture-mapped rectangle
where the texture coordinates lie beyond the [0.0, 1.0] range and using different texture
clamping modes.
Ex 3.10: Separable filters. Implement convolution with a separable kernel. The input should
be a grayscale or color image along with the horizontal and vertical kernels. Make sure you
support the padding mechanisms developed in the previous exercise. You will need this func-
tionality for some of the later exercises. If you already have access to separable filtering in an
image processing package you are using (such as IPL), skip this exercise.
3.8 Exercises 183
• (Optional) Use Pietro Perona’s (1995) technique to approximate convolution as a sum
of a number of separable kernels. Let the user specify the number of kernels and report
back some sensible metric of the approximation fidelity.
Ex 3.11: Discrete Gaussian filters. Discuss the following issues with implementing a dis-
crete Gaussian filter:
• If you just sample the equation of a continuous Gaussian filter at discrete locations,
will you get the desired properties, e.g., will the coefficients sum up to 1? Similarly, if
you sample a derivative of a Gaussian, do the samples sum up to 0 or have vanishing
higher-order moments?
• Would it be preferable to take the original signal, interpolate it with a sinc, blur with a
continuous Gaussian, then prefilter with a sinc before re-sampling? Is there a simpler
way to do this in the frequency domain?
• Would it make more sense to produce a Gaussian frequency response in the Fourier
domain and to then take an inverse FFT to obtain a discrete filter?
• How does truncation of the filter change its frequency response? Does it introduce any
additional artifacts?
• Are the resulting two-dimensional filters as rotationally invariant as their continuous
analogs? Is there some way to improve this? In fact, can any 2D discrete (separable or
non-separable) filter be truly rotationally invariant?
Ex 3.12: Sharpening, blur, and noise removal. Implement some softening, sharpening, and
non-linear diffusion (selective sharpening or noise removal) filters, such as Gaussian, median,
and bilateral (Section 3.3.1), as discussed in Section 3.4.2.
Take blurry or noisy images (shooting in low light is a good way to get both) and try to
improve their appearance and legibility.
Ex 3.13: Steerable filters. Implement Freeman and Adelson’s (1991) steerable filter algo-
rithm. The input should be a grayscale or color image and the output should be a multi-banded
◦ ◦
image consisting of G01 and G90 1 . The coefficients for the filters can be found in the paper
by Freeman and Adelson (1991).
Test the various order filters on a number of images of your choice and see if you can
reliably find corner and intersection features. These filters will be quite useful later to detect
elongated structures, such as lines (Section 7.4).
184 Computer Vision: Algorithms and Applications, 2nd ed. (final draft, Sept. 2021)
(a) (b) (c)
Figure 3.52 Sample images for testing the quality of resampling algorithms: (a) a synthetic
chirp; (b) and (c) some high-frequency images from the image compression community.
Ex 3.14: Bilateral and guided image filters. Implement or download code for bilateral and/or
guided image filtering and use this to implement some image enhancement or processing ap-
plication, such as those described in Section 3.3.2
Ex 3.15: Fourier transform. Prove the properties of the Fourier transform listed in Szeliski
(2010, Table 3.1) and derive the formulas for the Fourier transforms pairs listed in Szeliski
(2010, Table 3.2) and Table 3.1. These exercises are very useful if you want to become com-
fortable working with Fourier transforms, which is a very useful skill when analyzing and
designing the behavior and efficiency of many computer vision algorithms.
Ex 3.16: High-quality image resampling. Implement several of the low-pass filters pre-
sented in Section 3.5.2 and also the windowed sinc shown in Figure 3.28. Feel free to imple-
ment other filters (Wolberg 1990; Unser 1999).
Apply your filters to continuously resize an image, both magnifying (interpolating) and
minifying (decimating) it; compare the resulting animations for several filters. Use both a
synthetic chirp image (Figure 3.52a) and natural images with lots of high-frequency detail
(Figure 3.52b–c).
You may find it helpful to write a simple visualization program that continuously plays the
animations for two or more filters at once and that let you “blink” between different results.
Discuss the merits and deficiencies of each filter, as well as the tradeoff between speed
and quality.
Ex 3.17: Pyramids. Construct an image pyramid. The inputs should be a grayscale or
color image, a separable filter kernel, and the number of desired levels. Implement at least
the following kernels:
3.8 Exercises 185
• 2 × 2 block filtering;
• Burt and Adelson’s binomial kernel 1/16(1, 4, 6, 4, 1) (Burt and Adelson 1983a);
• a high-quality seven- or nine-tap filter.
Compare the visual quality of the various decimation filters. Also, shift your input image by
1 to 4 pixels and compare the resulting decimated (quarter size) image sequence.
Ex 3.18: Pyramid blending. Write a program that takes as input two color images and a
binary mask image and produces the Laplacian pyramid blend of the two images.
1. Construct the Laplacian pyramid for each image.
2. Construct the Gaussian pyramid for the two mask images (the input image and its
complement).
3. Multiply each Laplacian image by its corresponding mask and sum the images (see
Figure 3.41).
4. Reconstruct the final image from the blended Laplacian pyramid.
Generalize your algorithm to input n images and a label image with values 1. . . n (the value
0 can be reserved for “no input”). Discuss whether the weighted summation stage (step 3)
needs to keep track of the total weight for renormalization, or whether the math just works
out. Use your algorithm either to blend two differently exposed image (to avoid under- and
over-exposed regions) or to make a creative blend of two different scenes.
Ex 3.19: Pyramid blending in PyTorch. Re-write your pyramid blending exercise in Py-
Torch.
1. PyTorch has support for all of the primitives you need, i.e., fixed size convolutions
(make sure they filter each channel separately), downsampling, upsampling, and addi-
tion, subtraction, and multiplication (although the latter is rarely used).
2. The goal of this exercise is not to train the convolution weights, but just to become
familiar with the DNN primitives available in PyTorch.
3. Compare your results to the ones using a standard Python or C++ computer vision
library. They should be identical.
4. Discuss whether you like this API better or worse for these kinds of fixed pipeline
imaging tasks.
186 Computer Vision: Algorithms and Applications, 2nd ed. (final draft, Sept. 2021)
Ex 3.20: Local Laplacian—challenging. Implement the local Laplacian contrast manipu-
lation technique (Paris, Hasinoff, and Kautz 2011; Aubry, Paris et al. 2014) and use this to
implement edge-preserving filtering and tone manipulation.
Ex 3.21: Wavelet construction and applications. Implement one of the wavelet families
described in Section 3.5.4 or by Simoncelli and Adelson (1990b), as well as the basic Lapla-
cian pyramid (Exercise 3.17). Apply the resulting representations to one of the following two
tasks:
• Compression: Compute the entropy in each band for the different wavelet implemen-
tations, assuming a given quantization level (say, 1/4 gray level, to keep the rounding
error acceptable). Quantize the wavelet coefficients and reconstruct the original im-
ages. Which technique performs better? (See Simoncelli and Adelson (1990b) or any
of the multitude of wavelet compression papers for some typical results.)
• Denoising. After computing the wavelets, suppress small values using coring, i.e., set
small values to zero using a piecewise linear or other C 0 function. Compare the results
of your denoising using different wavelet and pyramid representations.
Ex 3.22: Parametric image warping. Write the code to do affine and perspective image
warps (optionally bilinear as well). Try a variety of interpolants and report on their visual
quality. In particular, discuss the following:
• In a MIP-map, selecting only the coarser level adjacent to the computed fractional
level will produce a blurrier image, while selecting the finer level will lead to aliasing.
Explain why this is so and discuss whether blending an aliased and a blurred image
(tri-linear MIP-mapping) is a good idea.
• When the ratio of the horizontal and vertical resampling rates becomes very different
(anisotropic), the MIP-map performs even worse. Suggest some approaches to reduce
such problems.
Ex 3.23: Local image warping. Open an image and deform its appearance in one of the
following ways:
1. Click on a number of pixels and move (drag) them to new locations. Interpolate the
resulting sparse displacement field to obtain a dense motion field (Sections 3.6.2 and
3.5.1).
2. Draw a number of lines in the image. Move the endpoints of the lines to specify their
new positions and use the Beier–Neely interpolation algorithm (Beier and Neely 1992),
discussed in Section 3.6.2, to get a dense motion field.
3.8 Exercises 187
3. Overlay a spline control grid and move one grid point at a time (optionally select the
level of the deformation).
4. Have a dense per-pixel flow field and use a soft “paintbrush” to design a horizontal and
vertical velocity field.
5. (Optional): Prove whether the Beier–Neely warp does or does not reduce to a sparse
point-based deformation as the line segments become shorter (reduce to points).
Ex 3.24: Forward warping. Given a displacement field from the previous exercise, write
a forward warping algorithm:
1. Write a forward warper using splatting, either nearest neighbor or soft accumulation
(Section 3.6.1).
2. Write a two-pass algorithm that forward warps the displacement field, fills in small
holes, and then uses inverse warping (Shade, Gortler et al. 1998).
3. Compare the quality of these two algorithms.
Ex 3.25: Feature-based morphing. Extend the warping code you wrote in Exercise 3.23
to import two different images and specify correspondences (point, line, or mesh-based) be-
tween the two images.
1. Create a morph by partially warping the images towards each other and cross-dissolving
(Section 3.6.3).
2. Try using your morphing algorithm to perform an image rotation and discuss whether
it behaves the way you want it to.
Ex 3.26: 2D image editor. Extend the program you wrote in Exercise 2.2 to import images
and let you create a “collage” of pictures. You should implement the following steps:
1. Open up a new image (in a separate window).
2. Shift drag (rubber-band) to crop a subregion (or select whole image).
3. Paste into the current canvas.
4. Select the deformation mode (motion model): translation, rigid, similarity, affine, or
perspective.
5. Drag any corner of the outline to change its transformation.
188 Computer Vision: Algorithms and Applications, 2nd ed. (final draft, Sept. 2021)
Figure 3.53 There is a faint image of a rainbow visible in the right-hand side of this picture.
Can you think of a way to enhance it (Exercise 3.29)?
6. (Optional) Change the relative ordering of the images and which image is currently
being manipulated.
The user should see the composition of the various images’ pieces on top of each other.
This exercise should be built on the image transformation classes supported in the soft-
ware library. Persistence of the created representation (save and load) should also be sup-
ported (for each image, save its transformation).
Ex 3.27: 3D texture-mapped viewer. Extend the viewer you created in Exercise 2.3 to in-
clude texture-mapped polygon rendering. Augment each polygon with (u, v, w) coordinates
into an image.
Ex 3.28: Image denoising. Implement at least two of the various image denoising tech-
niques described in this chapter and compare them on both synthetically noised image se-
quences and real-world (low-light) sequences. Does the performance of the algorithm de-
pend on the correct choice of noise level estimate? Can you draw any conclusions as to
which techniques work better?
Ex 3.29: Rainbow enhancer—challenging. Take a picture containing a rainbow, such as
Figure 3.53, and enhance the strength (saturation) of the rainbow.
1. Draw an arc in the image delineating the extent of the rainbow.
2. Fit an additive rainbow function (explain why it is additive) to this arc (it is best to work
with linearized pixel values), using the spectrum as the cross-section, and estimating
the width of the arc and the amount of color being added. This is the trickiest part of
3.8 Exercises 189
the problem, as you need to tease apart the (low-frequency) rainbow pattern and the
natural image hiding behind it.
3. Amplify the rainbow signal and add it back into the image, re-applying the gamma
function if necessary to produce the final image.