Spring 2016 CS543 / ECE549
Computer Vision
Course webpage URL: http://slazebni.cs.illinois.edu/spring16/
The goal of computer vision
• To extract “meaning” from pixels
What we see What a computer sees
Source: S. Narasimhan
The goal of computer vision
• To extract “meaning” from pixels
Humans are remarkably good at this…
Source: “80 million tiny images” by Torralba et al.
What kind of information can be
extracted from an image?
tree
roof tree
sky chimney
building
building
window
door
trashcan car car
person
Outdoor scene
ground City European
…
Semantic information Geometric information
Why study computer vision?
• Vision is useful
• Vision is interesting
• Vision is difficult
• Half of primate cerebral cortex is devoted to visual
processing
• Achieving human-level image understanding is probably
“AI-complete”
Successes of computer vision to date
“Simple” patterns
Faces
Face movies
I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg and S. Seitz,
Exploring Photobios, SIGGRAPH 2011
YouTube Video
Automatic age progression
I. Kemelmacher-Shlizerman, S. Suwajanakorn, and S. Seitz, Illumination-
Aware Age Progression, CVPR 2014
YouTube Video
Digital puppetry
S. Suwajanakorn, S. Seitz, and I. Kemelmacher-Shlizerman, What Makes
Tom Hanks Look Like Tom Hanks, ICCV 2015
YouTube Video
Reconstruction: 3D from photo collections
Q. Shan, R. Adams, B. Curless, Y. Furukawa, and S. Seitz, The Visual
Turing Test for Scene Reconstruction, 3DV 2013
YouTube Video
Reconstruction: 4D from photo collections
R. Martin-Brualla, D. Gallup, and S. Seitz, Time-Lapse Mining from Internet
Photos, SIGGRAPH 2015
YouTube Video
Reconstruction: 4D from depth cameras
R. Newcombe, D. Fox, and S. Seitz, DynamicFusion:
Reconstruction and Tracking of Non-rigid Scenes in Real-Time,
CVPR 2015
YouTube Video
Recognition
• Computer Eyesight Gets a Lot More Accurate,
NY Times Bits blog, August 18, 2014
• Building A Deeper Understanding of Images,
Google Research Blog, September 5, 2014
• Baidu caught gaming recent supercomputer
performance test, Engadget, June 3, 2015
Self-driving cars
http://www.nytimes.com/2016/01/18/technology/driverless-
cars-limits-include-human-nature.html
Why is computer vision difficult?
Challenges: viewpoint variation
Challenges: illumination
image credit: J. Koenderink
Challenges: scale
slide credit: Fei-Fei, Fergus & Torralba
Challenges: deformation
Xu, Beihong 1943
slide credit: Fei-Fei, Fergus & Torralba
Challenges: object intra-class
variation
slide credit: Fei-Fei, Fergus & Torralba
Challenges: occlusion, clutter
Image source: National Geographic
Challenges: Motion
Challenges: ambiguity
slide credit: Fei-Fei, Fergus & Torralba
Challenges: ambiguity
• Many different 3D scenes could have given rise to a
particular 2D picture
Challenges or opportunities?
• Images are confusing, but they also reveal the structure of
the world through numerous cues
• Our job is to interpret the cues!
Depth cues: Linear perspective
Depth cues: Parallax
Shape cues: Texture gradient
Shape and lighting cues: Shading
Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba
Grouping cues: Similarity (color, texture,
proximity)
Grouping cues: “Common fate”
Image credit: Arthus-Bertrand (via F. Durand)
Origins of computer vision
L. G. Roberts, Machine Perception
of Three Dimensional Solids,
Ph.D. thesis, MIT Department of
Electrical Engineering, 1963.
Origins of computer vision
Source: Fei-Fei Li
Connections to other disciplines
Artificial Intelligence
Robotics Machine Learning
Computer Vision
Computer Graphics Cognitive science
Neuroscience
Image Processing
The computer vision industry
• Corporate sponsors of CVPR 2015:
Course overview
I. Early vision: Image formation and processing
II. Mid-level vision: Grouping and fitting
III. Multi-view geometry
IV. Recognition
V. Additional topics
I. Early vision
• Basic image formation and processing
* =
Linear filtering
Edge detection
Cameras and sensors
Light and color
Feature extraction, feature tracking
II. “Mid-level vision”
• Fitting and grouping
Fitting: Least squares Alignment
Hough transform
RANSAC
III. Multi-view geometry
Epipolar geometry Stereo
Structure from motion 3D Photography
IV. Recognition
Instance recognition, large-scale alignment Image classification
Object detection
Deep learning
V. Additional Topics (time permitting)
Segmentation Video
RGBD images Images and text