Pedestrian detection with HOG features
The World Bank estimates that each year car accidents kill about 1.2 million people, of whom about
two thirds are pedestrians. This means that detecting pedestrians is an important application
problem, because cars that can automatically detect and avoid pedestrians might save many lives.
Pedestrians wear many different kinds of clothing and appear in many different configurations, but,
at relatively low resolution, pedestrians can have a fairly characteristic appearance.
The most usual cases are lateral or frontal views of a walk. In these cases, we see either a “lollipop”
shape — the torso is wider than the legs, which are together in the stance phase of the walk — or a
“scissor” shape — where the legs are swinging in the walk. We expect to see some evidence of arms
and legs, and the curve around the shoulders and head also tends to visible and quite distinctive.
This means that, with a careful feature construction, we can build a useful moving-window
pedestrian detector.
There isn’t always a strong contrast between the pedestrian and the background, so it is better to
use orientations than edges to represent the image window. Pedestrians can move their arms and
legs around, so we should use a histogram to suppress some spatial detail in the feature. We break
up the window into cells, which could overlap, and build an orientation histogram in each cell. Doing
so will produce a feature that can tell whether the head-and shoulders curve is at the top of the
window or at the bottom, but will not change if the head moves slightly.
One further trick is required to make a good feature. Because orientation features are not affected
by illumination brightness, we cannot treat high-contrast edges specially. This means that the
distinctive curves on the boundary of a pedestrian are treated in the same way as fine texture detail
in clothing or in the background, and so the signal may be submerged in noise. We can recover
gradient is compared to other gradients in the same cell. We will write || ∇Ix || for the gradient
contrast information by counting gradient orientations with weights that reflect how significant a
magnitude at point x in the image, write C for the cell whose histogram we wish to compute, and
write wx,C for the weight that we will use for the orientation at x for this cell. A natural choice of
weight is wx,
C = ||∇Ix || u∈C ||∇Iu || .
This compares the gradient magnitude to others in the cell, so gradients that are large compared to
their neighbors get a large weight. The resulting feature is usually called a HOG FEATURE (for
Histogram Of Gradient orientations).
This feature construction is the main way in which pedestrian detection differs from face detection.
Otherwise, building a pedestrian detector is very like building a face detector. The detector sweeps a
window across the image, computes features for that window, then presents it to a classifier. Non-
maximum suppression needs to be applied to the output. In most applications, the scale and
orientation of typical pedestrians is known. For example, in driving applications in which a camera is
fixed to the car, we expect to view mainly vertical pedestrians, and we are interested only in nearby
pedestrians. Several pedestrian data sets have been published, and these can be used for training
the classifier. Pedestrians are not the only type of object we can detect. In Figure 24.15 we see that
similar techniques can be used to find a variety of objects in different contexts.
Fig1.1
Fig1.2
Local orientation histograms are a powerful feature for recognizing even quite complex objects. On
the left, an image of a pedestrian. On the center left, local orientation histograms for patches. We
then apply a classifier such as a support vector machine to find the weights for each histogram that
best separate the positive examples of pedestrians from non-pedestrians. We see that the positively
weighted components look like the outline of a person. The negative components are less clear; they
represent all the patterns that are not pedestrians.
Fig1.3
Another example of object recognition, this one using the SIFT feature (Scale Invariant Feature
Transform), an earlier version of the HOG feature. On the left, images of a shoe and a telephone that
serve as object models. In the center, a test image. On the right, the shoe and the telephone have
been detected by: finding points in the image whose SIFT feature descriptions match a model;
computing an estimate of pose of the model; and verifying that estimate. A strong match is usually
verified with rare false positives.