0% found this document useful (0 votes)

289 views42 pages

Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Neural ordinary differential equations (ODEs) provide a novel framework for modeling temporal data and continuous normalizing flows using neural networks parameterized as ODEs. By interpreting deep learning models like ResNets as discretizations of ODEs, the framework allows adapting step sizes and leveraging black-box ODE solvers during training for more accurate and memory-efficient computation compared to fixed discretizations. The framework enables continuous-time modeling using only constant memory and adapts computation to instance complexity.

Uploaded by

Gabriel L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

289 views42 pages

Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Uploaded by

Gabriel L

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Neural Ordinary

Differential Equations

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt*, David Duvenaud

University of Toronto
Background: Ordinary Differential Equations (ODEs)
- Model the instantaneous change of a state.

(explicit form)

- Solving an initial value problem (IVP) corresponds to integration.

(solution is a trajectory)

- Euler method approximates with small steps:

Residual Networks interpreted as an ODE Solver
- Hidden units look like:
- Final output is the composition:

Haber & Ruthotto (2017). E (2017).

Residual Networks interpreted as an ODE Solver
- Hidden units look like:
- Final output is the composition:

- This can be interpreted as an Euler

discretization of an ODE.

- In the limit of smaller steps:

Haber & Ruthotto (2017). E (2017).

Deep Learning as Discretized Differential Equations
Many deep learning networks can be interpreted as ODE solvers.
Network Fixed-step Numerical Scheme

ResNet, RevNet, ResNeXt, etc. Forward Euler

Lu et al. (2017)
Chang et al. (2018)
PolyNet Approximation to Backward Euler
Zhu et al. (2018)
FractalNet Runge-Kutta

DenseNet Runge-Kutta
Deep Learning as Discretized Differential Equations
Many deep learning networks can be interpreted as ODE solvers.
Network Fixed-step Numerical Scheme

ResNet, RevNet, ResNeXt, etc. Forward Euler

Lu et al. (2017)
Chang et al. (2018)
PolyNet Approximation to Backward Euler
Zhu et al. (2018)
FractalNet Runge-Kutta

DenseNet Runge-Kutta

But:
(1) What is the underlying dynamics?
(2) Adaptive-step size solvers provide better error handling.
“Neural” Ordinary Differential Equations

Instead of y = F(x),
“Neural” Ordinary Differential Equations

Instead of y = F(x), solve y = z(T)

given the initial condition z(0) = x.

Parameterize
“Neural” Ordinary Differential Equations

Instead of y = F(x), solve y = z(T)

given the initial condition z(0) = x.

Parameterize

Solve the dynamic using any

black-box ODE solver.
- Adaptive step size.
- Error estimate.
- O(1) memory learning.
Backprop without knowledge of the ODE Solver
Ultimately want to optimize some loss
Backprop without knowledge of the ODE Solver
Ultimately want to optimize some loss

Naive approach: Know the solver. Backprop through the solver.

- Memory-intensive.
- Family of “implicit” solvers perform inner optimization.
Backprop without knowledge of the ODE Solver
Ultimately want to optimize some loss

Naive approach: Know the solver. Backprop through the solver.

- Memory-intensive.
- Family of “implicit” solvers perform inner optimization.
Our approach: Adjoint sensitivity analysis. (Reverse-mode Autodiff.)
- Pontryagin (1962).
+ Automatic differentiation.
+ O(1) memory in backward pass.
Continuous-time Backpropagation
Residual network. Adjoint method. Define:

Forward:

Backward:

Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:

Forward: Forward:

Backward:

Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:

Forward: Forward:

Backward: Backward:
Adjoint State Adjoint DiffEq

Params:
Continuous-time Backpropagation
Residual network. Adjoint method. Define:

Forward: Forward:

Backward: Backward:
Adjoint State Adjoint DiffEq

Params: Params:
A Differentiable Primitive for AutoDiff

Forward:

Backward:
A Differentiable Primitive for AutoDiff

Forward:

Backward:
A Differentiable Primitive for AutoDiff

Don’t need to store layer activations for reverse pass - just follow dynamics in
reverse!

Reversible networks (Gomez et al. 2018) also only require O(1)-memory, but
require very specific neural network architectures with partitioned dimensions.
Reverse versus Forward Cost

- Empirically, reverse
pass roughly half as
expensive as forward
pass.
-

- Adapts to instance
difficulty.
-

- Num evaluations can

be viewed as number of
layers in neural nets.

NFE = Number of Function Evaluations.

Dynamics Become Increasingly Complex

- Dynamics become
more demanding to
compute during
training.

- Adapts computation
time according to
complexity of diffeq.

In contrast, Chang et al. (ICLR 2018)

explicitly add layers during training.
Continuous-time RNNs for Time Series Modeling
- We often want arbitrary measurement times, ie. irregular time intervals.
- Can do VAE-style inference with a latent ODE.
ODEs vs Recurrent Neural Networks (RNNs)

- RNNs learn very

stiff dynamics,
have exploding
gradients.
-

- Whereas ODEs
are guaranteed
to be smooth.
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

- In other words,
Continuous Normalizing Flows
Instantaneous Change of variables (iCOV):

- For a Lipschitz continuous function

- In other words,

With an
invertible F:
Continuous Normalizing Flows
1D: 2D: Data Discrete-NF CNF
Is the ODE being correctly solved?
Stochastic Unbiased Log Density
Stochastic Unbiased Log Density

Can further reduce time complexity using stochastic estimators.

Grathwohl et al. (2019)

FFJORD - Stochastic Continuous Flows
MNIST - Model Samples CIFAR10 - Model Samples

Grathwohl et al. (2019)

Variational Autoencoders with FFJORD
ODE Solving as a Modeling Primitive
Adaptive-step solvers with O(1) memory backprop.

github.com/rtqichen/torchdiffeq

Future directions we’re currently working on:

- Latent Stochastic Differential Equations.

- Network architectures suited for ODEs.
- Regularization of dynamics to require fewer evaluations.
Co-authors:

Yulia Rubanova Jesse Bettencourt David Duvenaud

Thanks!
Extra Slides
Latent Space Visualizations
• Released an implementation of reverse-mode
autodiﬀ through black-box ODE solvers.

• Solves a system of size 2D + K + 1.

• In contrast, forward-mode implementation

solves a system of size D^2 + KD.

• Tensorﬂow has Dormand-Prince-Shampine

Runge-Kutta 5(4) implemented, but uses
naive autodiﬀ for backpropagation.
How much precision is needed?
Explicit Error Control

- More fine-grained
control than
low-precision floats.

- Cost scales with

instance difficulty.

NFE = Number of Function Evaluations.

Computation Depends on Complexity of Dynamics

- Time cost is dominated by

evaluation of dynamics f.

NFE = Number of Function Evaluations.

Why not use an ODE solver as modeling primitive?
- Solving an ODE is expensive.
Future Directions
- Stochastic differential equations and Random ODEs. Approximates stochastic
gradient descent.
- Scaling up ODE solvers with machine learning.
- Partial differential equations.
- Graphics, physics, simulations.

Mathematics For Electrical Science and Physical Science, M-1, S2
No ratings yet
Mathematics For Electrical Science and Physical Science, M-1, S2
4 pages
Solutions Manual For Principles Practice of Physics 2nd Edition by Mazur Fast Access
No ratings yet
Solutions Manual For Principles Practice of Physics 2nd Edition by Mazur Fast Access
318 pages
Rapid Download Uncertainty Quantification Theory Implementation and Applications 1st Edition Ralph C. Smith Ebook PDF All Chapters
100% (5)
Rapid Download Uncertainty Quantification Theory Implementation and Applications 1st Edition Ralph C. Smith Ebook PDF All Chapters
86 pages
Rajasthan Basin
No ratings yet
Rajasthan Basin
239 pages
Multicast Sockets Overview and Practical Java Example 1
No ratings yet
Multicast Sockets Overview and Practical Java Example 1
10 pages
软件工程实践者的研究方法原书第8版本科教学版
No ratings yet
软件工程实践者的研究方法原书第8版本科教学版
412 pages
Proofs That Really Count The Art of Combinatorial Proof 1st Edition Arthur T. Benjamin Download
100% (4)
Proofs That Really Count The Art of Combinatorial Proof 1st Edition Arthur T. Benjamin Download
91 pages
Grade 8 August Holiday Revision Booklet
No ratings yet
Grade 8 August Holiday Revision Booklet
154 pages
I. Introduction To Computer Programming
No ratings yet
I. Introduction To Computer Programming
14 pages
吴克利：审讯心理攻略 (扫描版)
No ratings yet
吴克利：审讯心理攻略 (扫描版)
401 pages
AutoCAD and Its Applications - Capítulo 5
100% (1)
AutoCAD and Its Applications - Capítulo 5
26 pages
Deep Tech Demystifying The Breakthrough Technologies That Will Revolutionize Everything by Eric Redmond
No ratings yet
Deep Tech Demystifying The Breakthrough Technologies That Will Revolutionize Everything by Eric Redmond
326 pages
Q1 LE Mathematics-8 Lesson-2 Week-2
No ratings yet
Q1 LE Mathematics-8 Lesson-2 Week-2
25 pages
Rav4 Distribucion
100% (2)
Rav4 Distribucion
50 pages
英一真题阅读原文未删减带出处
No ratings yet
英一真题阅读原文未删减带出处
82 pages
Mitochondria, Endoplasmic Reticulum, Ribosomes
No ratings yet
Mitochondria, Endoplasmic Reticulum, Ribosomes
39 pages
Topology Concepts for Math Students
No ratings yet
Topology Concepts for Math Students
45 pages
Procedure of Selant Application MC Teaching
No ratings yet
Procedure of Selant Application MC Teaching
2 pages
AJAX for Web Developers
No ratings yet
AJAX for Web Developers
31 pages
Introduction To Information Communication Technology
No ratings yet
Introduction To Information Communication Technology
25 pages
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
No ratings yet
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
7 pages
UNIT3 2marks
No ratings yet
UNIT3 2marks
7 pages
Database Security Essentials
100% (1)
Database Security Essentials
53 pages
Industrial Packing Design Sheet
No ratings yet
Industrial Packing Design Sheet
24 pages
New Pattern Input Output Exam Cart
No ratings yet
New Pattern Input Output Exam Cart
55 pages
The Handbook of Computational Linguistics and Natural Language Processing 1st Edition Alexander Clark PDF Download
100% (1)
The Handbook of Computational Linguistics and Natural Language Processing 1st Edition Alexander Clark PDF Download
50 pages
Compressor Surge Control & Modeling
No ratings yet
Compressor Surge Control & Modeling
7 pages
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Databricks Course Content - Pratap - Qbex Technologies - 8886230001
3 pages
模拟电子技术基础第5版华成英
No ratings yet
模拟电子技术基础第5版华成英
549 pages
Decision Making in Risk & Uncertainty
No ratings yet
Decision Making in Risk & Uncertainty
9 pages
QFT Control System Design Guide
No ratings yet
QFT Control System Design Guide
54 pages
深度学习入门：基于Python的理论与实现
No ratings yet
深度学习入门：基于Python的理论与实现
314 pages
A2
No ratings yet
A2
13 pages
NVIDIA GPU Power Management
No ratings yet
NVIDIA GPU Power Management
10 pages
Hugging Face
No ratings yet
Hugging Face
3 pages
Awq Activation-Aware Weight Quantization
No ratings yet
Awq Activation-Aware Weight Quantization
14 pages
Blender大师建模·雕刻·材质·渲染
100% (1)
Blender大师建模·雕刻·材质·渲染
312 pages
Slides For 'Large Language Model: From Theory To Implementations', Chapter 5
No ratings yet
Slides For 'Large Language Model: From Theory To Implementations', Chapter 5
71 pages
HassHeilWeir ThomasCalculusEarlyTranscendentals 15E
No ratings yet
HassHeilWeir ThomasCalculusEarlyTranscendentals 15E
1 page
生成式人工智能治理与实践白皮书
No ratings yet
生成式人工智能治理与实践白皮书
97 pages
自然语言处理入门何晗14684324
No ratings yet
自然语言处理入门何晗14684324
387 pages
16.06 Principles of Automatic Control: The Nichols Chart
No ratings yet
16.06 Principles of Automatic Control: The Nichols Chart
12 pages
概率论（张颢）
No ratings yet
概率论（张颢）
465 pages
陶哲轩实分析 (Analysis) (陶哲轩 (Terence Tao) )
No ratings yet
陶哲轩实分析 (Analysis) (陶哲轩 (Terence Tao) )
478 pages
Engineering Applications of The Laplace Transform
No ratings yet
Engineering Applications of The Laplace Transform
7 pages
Machine Learning For Comprehensive Forecasting of Alzheimer's Disease Progression
No ratings yet
Machine Learning For Comprehensive Forecasting of Alzheimer's Disease Progression
14 pages
LP
No ratings yet
LP
49 pages
VTK DICOM Classes for Developers
No ratings yet
VTK DICOM Classes for Developers
25 pages
(People and Ideas) Daniel C. Tosteson (Auth.), Daniel C. Tosteson (Eds.) - Membrane Transport - People and Ideas (1989, Springer New York)
100% (1)
(People and Ideas) Daniel C. Tosteson (Auth.), Daniel C. Tosteson (Eds.) - Membrane Transport - People and Ideas (1989, Springer New York)
410 pages
SR4850 Product Datasheet
No ratings yet
SR4850 Product Datasheet
9 pages
UiPath Notes
No ratings yet
UiPath Notes
73 pages
Concept Map - Intro To Python Programming - Y8
No ratings yet
Concept Map - Intro To Python Programming - Y8
1 page
Introduction To CPLEX
No ratings yet
Introduction To CPLEX
36 pages
Scalable Quantum Simulation of Molecular Energies: Doi: Subject Areas: Condensed Matter Physics, Quantum Information
No ratings yet
Scalable Quantum Simulation of Molecular Energies: Doi: Subject Areas: Condensed Matter Physics, Quantum Information
13 pages
Neural Ordinary Differential Equations: Lu Et Al. 2017 Haber and Ruthotto 2017 Ruthotto and Haber 2018
No ratings yet
Neural Ordinary Differential Equations: Lu Et Al. 2017 Haber and Ruthotto 2017 Ruthotto and Haber 2018
18 pages
Intel Fpga Industrial Solutions Playbook 2022
No ratings yet
Intel Fpga Industrial Solutions Playbook 2022
43 pages
CPLEX User Manual
No ratings yet
CPLEX User Manual
596 pages
算法导论第三版中文
No ratings yet
算法导论第三版中文
797 pages
Environment Control System: Types of Absorbents
No ratings yet
Environment Control System: Types of Absorbents
4 pages
机器学习 - 实用案例解析-机械工业出版社 (2013)
No ratings yet
机器学习 - 实用案例解析-机械工业出版社 (2013)
302 pages
机器学习周志华 PDF
No ratings yet
机器学习周志华 PDF
442 pages
CPLEX Parameters Reference - IBM PDF
No ratings yet
CPLEX Parameters Reference - IBM PDF
164 pages
概率论与数理统计 by 陈希孺
No ratings yet
概率论与数理统计 by 陈希孺
417 pages
Reviewer
No ratings yet
Reviewer
5 pages
Writing Project 2
No ratings yet
Writing Project 2
9 pages
Theoretical and Experimental Determination of Cell Constants of Planar-Interdigitated Electrolyte Conductivity Sensors
No ratings yet
Theoretical and Experimental Determination of Cell Constants of Planar-Interdigitated Electrolyte Conductivity Sensors
5 pages
高中数学选修4-2 矩阵与变换.人教版 - 全83页
No ratings yet
高中数学选修4-2 矩阵与变换.人教版 - 全83页
83 pages
丘维声抽象代数
No ratings yet
丘维声抽象代数
246 pages
Kathrein 80010430 PDF
No ratings yet
Kathrein 80010430 PDF
1 page
Altas Copco FD 230 PDF
No ratings yet
Altas Copco FD 230 PDF
16 pages
Chapter 9 Quantitative Feedback Theory
No ratings yet
Chapter 9 Quantitative Feedback Theory
44 pages
20mat21 Class Question Paper Notes PDF
No ratings yet
20mat21 Class Question Paper Notes PDF
3 pages
Differential Equations in Economics
No ratings yet
Differential Equations in Economics
47 pages
(数据科学与工程技术丛书) Gareth James - Daniela Witten - Trevor Hastie 著 - 王星译 - 统计学习导论：基于R应用-机械工业出版社 (2015)
No ratings yet
(数据科学与工程技术丛书) Gareth James - Daniela Witten - Trevor Hastie 著 - 王星译 - 统计学习导论：基于R应用-机械工业出版社 (2015)
326 pages
Cplex Users Manual PDF
No ratings yet
Cplex Users Manual PDF
564 pages
IBM ILOG CPLEX What Is Inside of The Box
No ratings yet
IBM ILOG CPLEX What Is Inside of The Box
72 pages
低压一般用途电机 Low voltage General performance motors: Ner Group Co.,Limited
No ratings yet
低压一般用途电机 Low voltage General performance motors: Ner Group Co.,Limited
49 pages
(Intelligent Robotics and Autonomous Agents Series) Choset H., Et Al.-Principles of Robot Motion - Theory, Algorithms, and Implementations-MIT (2005)
100% (1)
(Intelligent Robotics and Autonomous Agents Series) Choset H., Et Al.-Principles of Robot Motion - Theory, Algorithms, and Implementations-MIT (2005)
616 pages
Neural ODEs: Continuous-Depth Models
No ratings yet
Neural ODEs: Continuous-Depth Models
13 pages
《创新的扩散》（美）埃弗雷特·M.罗杰斯 PDF
No ratings yet
《创新的扩散》（美）埃弗雷特·M.罗杰斯 PDF
500 pages
MATLAB基础教程第3版 14322804
No ratings yet
MATLAB基础教程第3版 14322804
356 pages
1.2 Elements of Programming: C MP Sing PR Grams
No ratings yet
1.2 Elements of Programming: C MP Sing PR Grams
29 pages
Introduction To Problem Solving in Cplex
No ratings yet
Introduction To Problem Solving in Cplex
15 pages
OPL Studio 3.7 User Manual
No ratings yet
OPL Studio 3.7 User Manual
274 pages
Cplex Excel User
No ratings yet
Cplex Excel User
87 pages
CPLEX
No ratings yet
CPLEX
42 pages
《周鸿祎自述：我的互联网方法论》周鸿祎
No ratings yet
《周鸿祎自述：我的互联网方法论》周鸿祎
104 pages
Simplex Method
100% (1)
Simplex Method
59 pages
Probs & Statistics 2010
No ratings yet
Probs & Statistics 2010
2 pages
ML System Architecture Guide
No ratings yet
ML System Architecture Guide
47 pages
Introduction To Turbulence Modelling
No ratings yet
Introduction To Turbulence Modelling
16 pages
GBB Ready Uses Cases - Azure Advanced Workloads For Vertical Industry v2
No ratings yet
GBB Ready Uses Cases - Azure Advanced Workloads For Vertical Industry v2
29 pages
Problems and Exercises in Operations Research: Ecole Polytechnique
No ratings yet
Problems and Exercises in Operations Research: Ecole Polytechnique
132 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages

Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Uploaded by

Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud

Uploaded by

Neural Ordinary

Ricky T. Q. Chen*, Yulia Rubanova*, Jesse Bettencourt*, David Duvenaud

- Solving an initial value problem (IVP) corresponds to integration.

- Euler method approximates with small steps:

Haber & Ruthotto (2017). E (2017).

- This can be interpreted as an Euler

- In the limit of smaller steps:

Haber & Ruthotto (2017). E (2017).

ResNet, RevNet, ResNeXt, etc. Forward Euler

ResNet, RevNet, ResNeXt, etc. Forward Euler

Instead of y = F(x), solve y = z(T)

Instead of y = F(x), solve y = z(T)

Solve the dynamic using any

Naive approach: Know the solver. Backprop through the solver.

Naive approach: Know the solver. Backprop through the solver.

- Num evaluations can

NFE = Number of Function Evaluations.

In contrast, Chang et al. (ICLR 2018)

- RNNs learn very

- For a Lipschitz continuous function

- For a Lipschitz continuous function

- For a Lipschitz continuous function

Can further reduce time complexity using stochastic estimators.

Grathwohl et al. (2019)

Grathwohl et al. (2019)

Future directions we’re currently working on:

- Latent Stochastic Differential Equations.

Yulia Rubanova Jesse Bettencourt David Duvenaud

• Solves a system of size 2D + K + 1.

• In contrast, forward-mode implementation

• Tensorﬂow has Dormand-Prince-Shampine

- Cost scales with

NFE = Number of Function Evaluations.

- Time cost is dominated by

NFE = Number of Function Evaluations.

You might also like

Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt*, David Duvenaud