CS 294-167 Spring 2022
Geometry and Learning for 3D Vision
Yi Ma
UC Berkeley
MASKS © 2004 Invitation to 3D vision
Course Information
• Course piazza:
https://piazza.com/berkeley/spring2022/cs294167/
(information, homework, lecture notes, and resources…)
• Office hours:
Monday, Tuesday 2-3pm (together with EE106B)
• Grading policy:
10% participation; 20% homework; 70% final project
• Prerequisite:
EECS280 or equivalent in computer vision or image processing
Undergraduate linear algebra, some familiarity with ML tools.
MASKS © 2004 Invitation to 3D vision
Main Textbook (on piazza)
MASKS © 2004 Invitation to 3D vision
Supplementary Textbook
https://szeliski.org/Book/
MASKS © 2004 Invitation to 3D vision
Lecture 1
Overview and Introduction
MASKS © 2004 Invitation to 3D vision
Reconstruction from images – The Fundamental Problem
Input: Corresponding “features” in multiple perspective images.
Output: Camera poses, calibration, scene structure representations.
(3D point clouds, meshes, voxels, implicit surfaces, radiance fields…)
MASKS © 2004 Invitation to 3D vision
Reconstruction from images – The Fundamental Problem
Input: Corresponding “features” in multiple perspective images.
Output: Camera poses, calibration, scene structure representations.
(3D point clouds, meshes, voxels, implicit surfaces, radiance fields…)
Point Clouds Meshes Voxels
Implicit surfaces CAD like Models
MASKS © 2004 Invitation to 3D vision
Reconstruction from images – The Fundamental Problem
Geometric relationships among multiple views of points, lines, and planes.
. . .
Geometric and algorithmic foundation for multiple-view geometry.
MASKS © 2004 Invitation to 3D vision
Reconstruction from images – The Fundamental Problem
“Rome wasn’t built in a day.”
MASKS © 2004 Invitation to 3D vision
APPLICATIONS – Autonomous Highway Vehicles (1990-)
Image courtesy of California PATH
MASKS © 2004 Invitation to 3D vision
APPLICATIONS – Today Autonomous Vehicles
MASKS © 2004 Invitation to 3D vision
APPLICATIONS – Unmanned Aerial Vehicles (UAVs, 1998)
Rate: 10Hz; Accuracy: 5cm, 4o
MASKS © 2004 Invitation to 3D vision Courtesy of Berkeley Robotics Lab
APPLICATIONS – Today Unmanned Aerial Vehicles (UAVs)
MASKS © 2004 Invitation to 3D vision
APPLICATIONS – Real-Time Virtual Object Insertion
MASKS © 2004 Invitation to 3D vision UCLA Vision Lab
APPLICATIONS – Real-Time Sports Coverage
First-down line and virtual advertising
MASKS © 2004 Invitation to 3D vision Princeton Video Image, Inc.
Virtual Museum on Your Phone
Multi-camera
Light stage On iPhone VR kit
Shanghai Museum Items
APPLICATIONS – Image Based Modeling and Rendering
MASKS © 2004 Invitation to 3D vision Image courtesy of Paul Debevec, 1996
APPLICATIONS – Image Alignment, Mosaicing, and Morphing
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Feature Selection and Correspondence
1. Small baselines versus large baselines
2. Point features versus line features
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Structure and Motion Recovery
1. Two views versus multiple views
2. Discrete versus continuous motion
3. General versus planar scene
4. Calibrated versus uncalibrated camera
5. One motion versus multiple motions
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Image Stratification and Dense Matching
Left
Right
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – 3-D Surface Model and Rendering
1. Point clouds versus surfaces (level sets)
2. Random shapes versus regular structures
MASKS © 2004 Invitation to 3D vision
GENERAL STEPS – Image-Based 3D Modeling
Building Rome in One Day
The Colosseum, 2,106 images
Steve Seitz, University of Washington, Richard Szeliski, Microsoft Research
Traditional 3D Reconstruction Pipeline
Feature Extraction & Multiview Point Clouds
Matching Geometry
Image Source: Internet
Limitation of Traditional 3D Reconstruction
Textureless Objects Reflection/Transparency Repetitive Patterns
Medium/Large baseline (SIFT Failure) Moving Objects
Image source: Internet
Deep Learning (Data-Driven) Approaches
Pose Estimation Voxels Point Clouds
Kehl, Wadim., et al. (2017) Song, S., et al. (2017) Charles Q., et al. (2017)
3D Bounding Cube Depth Map Regression Meshes Implicit Surfaces
Mousavian, A., et al. (2019) Li, Z., & Snavely, N. (2018) Groueix, T., et al. (2018) Weiyue, W., et al. (2019)
Challenges for Data-driven Approaches
n Recently research [1] suggests encoder-decoder
networks do not perform reconstruction but
classification
n CNN is not better than clever nearest
neighbors
n Cannot utilize geometry structures
Ground Truth AtlasNet OGN Matryoshka Clustering Retrieval Oracle NN
Maxim Tatarchenko, Stephan R. Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, Thomas Brox.
“What Do Single-view 3D Reconstruction Networks Learn?.” arXiv preprint arXiv:1905.03678 (2019).
We Live in a Highly Structured World
n Man-made environments are rich of structural regularities
n Straight lines
n Smooth curves
n Parallelism
n Orthogonality
n Symmetry
n How to detect & utilize them?
Image source: Internet
Symmetry based Modeling & Reconstruction
MASKS © 2004 Invitation to 3D vision
Symmetry based Modeling & Reconstruction
MASKS © 2004 Invitation to 3D vision
Symmetry based Modeling & Reconstruction
MASKS © 2004 Invitation to 3D vision
Regular Structure Based Modeling & Reconstruction
360o
panorama
TILT: Transform-Invariant Low-rank Textures, Z. Zhang, Y. Ma et. al, IJCV 2012
How to incorporate geometric knowledge into data-
driven learning approaches?
Multiple-View Reconstruction Recognition
Geometry:
o Points/junctions
o Lines
o Planes
o Incidence relations
o Symmetry
• Translation
• Reflection
• Rotation
[Ma, Soatto, Kosecka,
Sastry, 2004]
Combine Geometry and Learning (for Structures)
From Images to CAD Model
Multi-view Correspondence End-to-end Learning
Geometric Structure Data Representation
Learning with Structures, and for Structures, Yichao Zhou, UC Berkeley
Combine Geometry and Learning (for Structures)
Wireframes (junctions, lines, planes)
Learning to L-CNN:
Reconstruct 3D End-to-end
Wireframes from Wireframe
Single Images Parsing
(ICCV 2019) (ICCV 2019)
NeurVPS: NeRD: Neural 3D
Neural Vanishing Reflection Symmetry
Point Scanner via Detector
Conic Convolution (CVPR 2021)
(NeurIPS 2019)
Vanishing points (parallel, orthogonality) Symmetry (reflective, rotation, translation)
Holistic Scene Structures for 3D Vision
https://holistic-3d.github.io/iccv19/
From Images to 3D CAD Models
Holicity: 20 km^2 of downtown London
Yichao Zhou and Yi Ma et. al, UC Berkeley https://holicity.io
Evolution of Interface and Media
From 1D to 3D, and from physical to virtual (meta?)…
1D media 2D media 3D media
Quipu, Inca people
3rd millennium BCE
More Applications – 3D Object Digitization
With 3D vision, learning and light field technology at its
core, one can develop live virtual 3D digital technologies.
• Digital Human Reconstruction
• Live Holography
• 3D Reconstruction
• Interactive Videos
https://www.us1.dgene.com
More Applications – Digital Arts
On iPhone VR kit
Shanghai Museum Items
https://www.us1.dgene.com
More Applications – Virtual Shopping
https://www.us1.dgene.com
More Applications – Virtual Performance & Entertainment
https://www.us1.dgene.com
Reconstruction from images – The Fundamental Problem
“Rome wasn’t built in a day.”
But a digital Rome may be built in a day!
Let us start from the foundation...