Computer Vision
Introduction
What is Computer Vision?
• Make computers understand images and videos.
• What kind of scene?
• Where are the cars?
• How far is the building?
What is Computer Vision?
• Make computers understand images and videos.
• What are they doing?
• Why is this happening?
• What is important?
• What will I see?
Computer Vision and Nearby Fields
Digital Image Processing
Computational Photography Geometry (3D)
Computer Vision
Shape
Images (2D)
Photometry
Computer Graphics
Appearance
Machine learning:
Vision = Machine learning applied to visual data
Image Processing vs Computer Vision
• Image Processing
• Mostly concerned with image-to-image transformations
• Filtering
• Enhancement
• Compression
• Computer Vision
• Concerned with how images reflect the 3D world
• Filtering for feature extraction
• Enhancement for recognition/detection
• Compression that preserves geometric information in images
Visual data on the Internet
• Flickr 90% of net traffic
• 10+ billion photographs
• 60 million images uploaded a month
will be visual!
• Facebook Mostly about cats
• 250 billion+
• 300 million a day
• Instagram
• 55 million a day
• YouTube
• 100 hours uploaded every minute
Too big for humans
http://www.petittube.com/
• Need automatic tools to access and analyze visual data!
Vision is Really Hard
• Vision is an amazing feature of natural intelligence
• Visual cortex occupies about 50% of Macaque brain
• More human brain devoted to vision than anything else
Is that a
queen or a
bishop?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
What did you see?
• Where this picture was taken?
• How many people are there?
• What are they doing?
• What object the person on the left standing on?
• Why this is a funny picture?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Challenges: Many nuisance parameters
Illumination Object pose Clutter
Occlusions Intra-class Viewpoint
appearance
Challenges: Intra-class variation
Handling challenges?
We are really, really far from the human like perception of computer
vision. But we have:
• Lots and lots and lots of data.
• We can learn from humans.
• We have Prior knowledge, which can provide constrains of the
problem.
Computer Vision
Safety Health Security
Technology
Can Better Our Lives
Comfort Fun Access
History of Computer Vision
“In 1966, Minsky hired a first-year
undergraduate student and
assigned him a problem to solve
over the summer:
connect a camera to a computer
and get the machine to describe
Marvin Minsky, MIT what it sees.”
Turing award, 1969 Crevier 1993, pg. 88
Half a century later,
we're still working on it.
1960’s: interpretation of synthetic worlds
Larry Roberts Input image 2x2 gradient operator computed 3D model
“Father of Computer Vision” rendered from new viewpoint
Larry Roberts PhD Thesis, MIT, 1963,
Machine Perception of Three-Dimensional Solids Slide credit: Steve Seitz
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and Elschlager, 1973
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and Elschlager, 1973
1980’s: ANNs come and go; shift toward
geometry and increased mathematical rigor
Image credit: Rick Szeliski
1990’s: face recognition; statistical analysis in vogue
2000’s: broader recognition; large annotated
datasets available; video processing starts
2010’s: resurgence of deep learning
[AlexNet NIPS 2012] [DeepFace CVPR 2014]
[DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]
2020’s: autonomous vehicles
2030’s: robot uprising?
Examples of Computer Vision Applications
• How is computer vision used today?
Face detection
• Most digital cameras and smart phones detect faces (and more)
• Canon, Sony, Fuji, …
• For smart focus, exposure compensation, and cropping
Slide credit: Steve Seitz
Face recognition
Facebook face auto-tagging
Face Landmark Alignment – 3D Persona
What Makes Tom Hanks Look Like Tom Hanks ICCV 2015
Smile Detection
Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
Vision-based Biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia
Slide credit: Steve Seitz
Vision-based Biometrics
Optical Character Recognition (OCR)
• Technology to convert scanned docs to text
• If you have a scanner, it probably came with OCR software
Digit recognition, AT&T labs License plate readers
http://www.research.att.com/~yann/ http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
Slide credit: Steve Seitz
Computer vision in sports
Hawk-Eye: helping/improving referee decisions
Computer vision in sports
SportVision: improving viewer experiences
Computer vision in sports
Replay Technologies: improving viewer experiences
Computer vision in sports
Play tracking
Visual recognition for photo organization
Google photo
Earth viewers (3D modeling)
Image from Microsoft’s Virtual Earth
(see also: Google Earth)
Slide credit: Steve Seitz
3D from thousands of images
[Furukawa et al. CVPR 2010]
3D Time-lapse from Internet Photos
3D Time-lapse from Internet Photos, ICCV 2015
Style transfer
Source image (Style) Target image (Content) Output (deepart)
A Neural Algorithm of Artistic Style [Gatys et al. 2015]
Special effects: Matting and composition
Kylie Minogue - Come Into My World
Special effects: Shape capture
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Slide credit: Steve Seitz
Special effects: Motion capture
Pirates of the Carribean, Industrial Light and Magic Slide credit: Steve Seitz
Google cars
Google in talks with Ford, Toyota and Volkswagen to realise driverless cars
http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track-th
e-trick-that-makes-googles-self-driving-cars-work/370871/
Interactive Games: Kinect
• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
Vision in space
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Vision systems (JPL) used for several tasks
• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
Industrial robots
Vision-guided robots position nut runners on wheels
http://www.automationworld.com/computer-vision-opportunity-or-threat
Mobile robots
NASA’s Mars Spirit Rover http://www.robocup.org/
Saxena et al. 2008 http://www.youtube.com/w
STAIR at Stanford atch?v=DF39Ygp53mQ
Medical imaging
Image guided surgery
3D imaging Grimson et al., MIT
MRI, CT
Computer vision for the mass
Counting cells Predicting poverty
Current state of the art
• Many of these are less than 5 years old
• Very active and exciting research area!
• To learn more about vision applications and companies
– David Lowe maintains an excellent overview of vision companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
Topics of Studies in Computer Vision
• Interpreting Intensities
– What determines the brightness and color of a pixel?
– How can we use image filters to extract meaningful information from the image?
• Correspondence and Alignment
– How can we find corresponding points in objects or scenes?
– How can we estimate the transformation between them?
• Perspective and 3D Geometry
– How can we map between the 3D world and the 2D image?
– How can we recover 3D coordinates from images or video?
• Grouping and Segmentation
– How can we group pixels into meaningful regions?
• Categorization and Object Recognition
– How can we represent images and categorize them?
– How can we recognize categories of objects?
• Advanced Topics
– Action recognition, 3D scenes and context, CNNs, …
Resources
Books
• “Computer Vision: A Modern Approach”, by D. A. Forsyth, J. Ponce.
• “Digital Image Processing: An Algorithmic Approach” by Madhuri A. Joshi
• Videos
• https://www.youtube.com/playlist?list=PLyqSpQzTE6M_PI-rIz4O1jEgffhJU9GgG