CS 598 3D Vision:
Multi-View Geometry
Shenlong Wang
UIUC
Some materials borrowed from Matthew O’Toole, Kris Kitani, Jianxiong Xiao, Derek Hoeim, Sanja Fidler 1
Logistics
• Quiz 1 – If we didn’t reach out, it's satisfactory!
• Quiz 2 – Will be out tonight (due next Tuesday).
• Group assignment is out!
• Survey due date has been extended (Sept 26 → Oct 3). Do meet earlier to
conduct a literature review and select 25+ papers, then organize them into
groups and assign jobs within the group.
• Role-playing group: 1) discuss your tackling plans with us during Thursday
office hours, the week before your presentation, or arrange a quick ad-hoc
meeting. 2) share your presentation for feedback three days before your
group presentation.
2
Today’s Agenda
● Camera Calibration
● Structure from Motion
● Other Cameras
3
Big picture: 3 key components in 3D
3D Points
(Structure)
Camera
Correspondences
(Motion)
Angjoo Kanazawa
Camera Calibration
How do I know K?
3D object
Image Plane
Camera Center
intrinsic extrinsic
parameters parameters
5
Camera Calibration
● Inputs : A collection of images with points whose 2D image coordinates and 3D world
coordinates are known.
● Outputs: The 3×3 camera intrinsic matrix, the rotation and translation of each image.
Capture multiple images of the Finding camera parameters by
checkerboard from different viewpoints Find checkerboard corners
minimizing 3D-2D reprojection err
6
Image credit: OpenCV
Camera Calibration
● Minimizing the reprojection error
Extrinsic Intrinsic Detection Corner Points Perspective Projection Known 3D Location
7
Camera Calibration
https://www.mathworks.com/help/vision/camera-calibration.html
https://github.com/ethz-asl/kalibr
8
https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html
Structure-from-Motion
Each pair of 2D-2D correspondence establish
triagulaler relationships
9
Structure-from-Motion
10
Image credit: Jianxiong Xiao
Structure-from-Motion
● Establish 2D-2D correspondences across images
● Jointly refine camera pose and 3D points in an optimization framework
11
Two View Reconstruction
Bundle
Match
Adjustment
12
Keypoints Detection
● Step 1: Detect distinctive keypoints that are suitable for matching
13
Descriptor for each point
● Step 2: Compute visual descriptors (SIFT features)
SIFT
SIFT
14
Descriptor for each point
● Step 3: Measure pairwise distance / similarity between features
SIFT SIFT
SIFT SIFT
15
Match Points
● Step 3: Measure pairwise distance / similarity between features
SIFT SIFT
SIFT SIFT
16
Match Points
SIFT (scale-invariant feature transform)
● Step 1: Detect distinctive keypoints that are suitable for matching
● Step 2: Compute oriented histogram gradient features
● Step 3: Measure distance between each pair
17
Match Points
● How many pair-wise matching I need to conduct?
SIFT SIFT
SIFT SIFT
18
Match Points
● What if there are bad matches?
SIFT SIFT
SIFT SIFT
19
Match Points in Practice
How can we make SIFT matching faster than exhaustive search?
- Approximate nearest neighbor search
- Hashing, KD-tree, etc.
How can we ensure a pair of match is good?
- Ratio test: my nearest neighbor should be much better than other candidates
- Consistency-check: (1) keypoint A’s nearest neighbor in image 2 is keypoint
B; (2) keypoint B’s nearest neighbor in image 1 is also keypoint A.
20
Camera Calibration
● Inputs : A collection of images with points whose 2D image coordinates and 3D world
coordinates are known.
● Outputs: The 3×3 camera intrinsic matrix, the rotation and translation of each image.
Capture multiple images of the Finding camera parameters by
checkerboard from different viewpoints Find checkerboard corners
minimizing 3D-2D reprojection err
21
Image credit: OpenCV
Camera Calibration
● Minimizing the reprojection error
Extrinsic Intrinsic Detection Corner Points Perspective Projection Known 3D Location
22
Two View Reconstruction
Bundle
Match
Adjustment
23
Fundamental Matrix
24
Eight-Point Algorithm
• Given a correspondence
• Assume
• We can get
25
Eight-Point Algorithm
• Given 8 correspondences
• Nontrivial solution
• f is in null space of A
SVD!
26
Eight-Point Algorithm
● Rank constraint
● Minimize Frobenius norm
subject to
27
Rank Constraint
Before After 28
RANSAC Estimation
● For many times
○ Pick 8 points
○ Compute a solution for using these 8 points
○ Count number of inliers that with geometric error close to 0
● Pick the one with the largest number of inliers
● Only the inliers are kept as correspondences
29
30
Essential Matrix
31
Essential Matrix Decomposition
● Essential matrix E to R and t
Try to verify by yourself 32
Extending to Multiple Views
33
Image credit: Noah Snavely
Multi-view Triangulation
Are we guaranteed to
converge?
34
Multi-view Triangulation
What could be changed?
35
Bundle Adjustment
Extrinsic 3D Points Detection Keypoints Perspective Projection Unknow 3D Location
36
Bundle Adjustment
What is the difference between calibration vs structure from motion?
Extrinsic 3D Points Detection Keypoints Perspective Projection Unknow 3D Location
37
Continuous Optimization
MAP inference: find the best configuration that minimize the energy
There is no universal solution. Inference algorithm choice is depending on:
● Continuous vs Discrete Variables: numerical approach or search-based
● Energy Functions: convex, submodular, piecewise linear, quadratic, etc.
● Graphical Model Structures: containing loops or not; having high-order
terms or not? 38
MAP Inference: Gradient Descent
● Minimize continuous-valued energy based models by numerical optimization:
● Pros: simple and straightforward, works for all differentiable energies
● Cons: (sub-)differentiability requirements and slow to convergence
39
MAP Inference: Newton Method
● For twice-differentiable energy function, one could use Newton’s method:
40
MAP Inference: Newton Method
● For twice-differentiable energy function, one could use Newton’s method:
● Pros: capturing curvature, better convergence, less likely stuck, less tuning
● Cons: expensive to compute inverse Hessian, hard to scale
41
MAP Inference: Gauss-Newton
● If the energy has a sum of square form:
● For each iteration t:
○ Taylor approximation for the residual function:
○ Solving least square:
42
How to get the solution? Today’s Quiz
Multi-View Stereo
● Input: images from several viewpoints
with known camera poses and calibration
● Output: 3D object model
Why are SFM 3D points insufficient?
44
Measuring the matching cost
45
Measuring the matching cost
46
Colmap: Photometric + Geometric Cost + View Select
● Photometric consistency: normalized cross correlation
● Geometry consistency: forward-backward reprojection error
47
Pixelwise View Selection for Unstructured Multi-View Stereo, 2016
MVSNet
48
MVSNet: Depth Inference for Unstructured Multi-view Stereo, 2018
3D Reconstruction: SFM + MVS
49
Image credit: Google, Michael Keass
Visual SLAM: Online SFM
50
Image credit: Yang et al. ICRA 2021
Camera Distortion
51
Image credit: OpenCV
Camera Distortion
● Remember to cv2.undistort the image if you want to reason in 3D.
52
Image credit: OpenCV
Event Cameras
53
Image credit: Davide Scaramuzza
Fisheye Camera / Omnidirectional Camera
54
Image credit: OZ robotics
What I Didn’t Cover
● Stereo Rectification
Making two stereo camera frontal parallel.
● Five-Point Algorithms
Recover Essential/Fundamental Matrix from 2D-2D Correspondences
● Projection Matrix Decomposition
Recover R and t from camera projection matrix
● Essential Decomposition
Recover R and t from essential matrix estimation
● Perspective-n-Projection (PnP)
Recover R and t from 2D-3D correspondences
55
Check Szeliski or MVG Book if you want to know these concepts