Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
Presented at MaxEnt 2023: International Workshop on Bayesian Inference and Maximum Entropy
Methods in Science and Engineering, Max Planck Institut, Garching, Germany, July 3-7, 2023.
A modified and combined version of this paper will appear in MaxEnt2023 Proceedings.
Abstract
Machine Learning (ML) methods and tools have gained great success in many data, signal, im-
age and video processing tasks, such as classification, clustering, object detection, semantic segmen-
tation, language processing, Human-Machine interface, etc. In computer vision, image and video
processing, these methods are mainly based on Neural Networks (NN) and in particular Convolu-
tional NN (CNN), and more generally Deep NN.
Inverse problems arise anywhere we have indirect measurement. As, in general those inverse prob-
lems are ill-posed, to obtain satisfactory solutions for them needs prior information. Different regu-
larization methods have been proposed where the problem becomes the optimization of a criterion
with a likelihood term and a regularization term. The main difficulty however, in great dimensional
real applications, remains the computational cost. Using NN, and in particular Deep Learning (DL)
surrogate models and approximate computation can become very helpful.
In this work, we focus on NN and DL particularly adapted for inverse problems. We consider two
cases: First the case where the forward operator is known and used as physics constraint, the second
more general data driven DL methods.
key words: Neural Network, Deep Learning (DL), Inverse problems, Physics based DL
1 Introduction
In science and any engineering problem, we need to observe (measure) quantities. Some quantities
are directly observable (e.g.; Length), and some others are not (e.g.; Temperature). For example, to
measure the temperature, we need an instrument (thermometer) that measures the length of the liquid
1
in the thermometer tube, which can be related to the temperature. We may also wants to observe
its variation in time or its spatial distribution. One way to measure the spatial distribution of the
temperature is using an Infra-Red (IR) camera. But, in general, all these instruments, give indirect
measurements, related to what we really want to measure through some mathematical relation, called
Forward model. Then, we have to infer on the desired unknown from the observed data, using this
forward model or a surrogate one [1].
As, in general, many inverse problems are ill-posed, many classical methods for finding well-posed
solutions for them are mainly based on regularization theory. We may mention those, in particular,
which are based on the optimization of a criterion with two parts: a data-model output matching cri-
terion and a regularization term. Different criteria for these two terms and a great number of standard
and advanced optimization algorithms have been proposed and used with great success. When these
two terms are distances, they can have a Bayesian Maximum A Posteriori (MAP) interpretation where
these two terms correspond, respectively, to the likelihood and prior probability models [1].
The Bayesian approach gives more flexibility in choosing these terms via the likelihood and the
prior probability distributions. This flexibility goes much farther with the hierarchical models and
appropriate hidden variables [2]. Also, the possibility of estimating the hyper-parameters gives much
more flexibility for semi-supervised methods.
However, the full Bayesian computations can become very heavy computationally. In particular
when the forward model is complex and the evaluation of the likelihood needs high computational
cost. In those cases using surrogate simpler models can become very helpful to reduce the computa-
tional costs, but then, we have to account for uncertainty quantification (UQ) of the obtained results [3].
Neural Networks (NN) with their diversity such as Convolutional (CNN), Deep learning (DL), etc.,
have become tools as fast and low computational surrogate forward models for them.
In the last three decades, the Machine Learning (ML) methods and algorithms have gained great
success in many computer vision (CV) tasks, such as classification, clustering, object detection, seman-
tic segmentation, etc. These methods are mainly based on Neural Networks (NN) and in particular
Convolutional NN (CNN), Deep NN, etc. [4–6, 6, 7, 7, 8, 8–10].
Using these methods directly for inverse problems, as intermediate pre-processing or as tools for
doing fast approximate computation in different steps of regularization or Bayesian inference have
also got success, but not yet as much as they could. Recently, the Physics-Informed Neural Networks
have gained great success in many inverse problems, proposing interaction between the Bayesian for-
mulation of forward models, optimization algorithms and ML specific algorithms for intermediate
hidden variables. These methods have become very helpful to obtain approximate practical solutions
to inverse problems in real world applications [6, 8, 11–17].
In this paper, first, in Section 2, a few general idea of ML, NN and DL are summarised, then in
Section 3, 4 and 5, we focus on the NN and DL methods for inverse problems. First, we present same
cases where we know the forward and its adjoint model. Then, we consider the case we may not have
this knowledge and want to propose directly data driven DL methods [18, 19].
2
2 Machine Learning and Neural Networks basic idea
The main idea in Machine Learning is first to learn a model fW ( x) from a great number of input-output
training data, for example, in a supervised classification problem, data classes: ( xi , ci ), i = 1, · · · N:
Learning Data Learning step
→ → θ,
( xi , ci )iN=1 The weights W of the NN model ϕW ( x) are obtained
and then, when a new case (Test x j ) appears, it uses the learned weights W to give a decision c j .
Training Machine
data - learning
( xi , ci )iN=1 algorithm
?
Novel data - Model - Output
xj ϕW ( x) cj
Figure 1: Basic Machine Learning process: First Learn a model, then use it. Learning step needs a rich
enough data base which costs a lot. When the model is learned and tested, its use is easy, fast and its
cost is low.
3
where A = ( H t H + λI )−1 H t , B = ( H t H + λI )−1 and C = ( λ1 H H t + I )−1 .
These relations can be presented schematically as:
g→ A → bf , g→ Ht → B → bf , g→ C → Ht → bf .
As we can see, these relations directly induce linear feed forward NN structure. In particular, if H
represents a convolution operator, then H t , H t H and H H t are too, as well as the operators B and C.
Thus the whole inversion can be modelled by CNN [5, 10].
For the case of Computed Tomography (CT), the first operation is equivalent to an analytic inver-
sion, the second corresponds to Back-Projection first followed by 2D filtering in the image domain, and
the third correspond to to the famous Filtered Back-Projection (FBP) which is implemented on classical
CT scans. These three cases are illustrated on Figure 2.
Analytical
Inversion
g→ A → bf
Direct NN
2D
Filtering
Analytical
Ht
Inversion
g→ → B → bf or
Back
Projection
2D Filering
By NN
Backprojection + NN 2D filtering
Analytical
Inversion
Analytical
Inversion
g→ C → Ht → bf or
Back
Projection
Filtering
by NN
Figure 2: Three linear NN structures which are derived directly from quadratic regularization inver-
sion method. Right part of this figure is adapted from [10].
bf = Db z = S 1 ( D t g ),
z and b (4)
λ
g→ Dt → Thresholding → b
z→ D → bf or equivalently g → Two layers CNN → bf .
4
3.3 Third example: A Deep learning equivalence of iterative gradient based algorithms
One of the classical iterative methods in linear inverse problems algorithm is based on the gradient
descent method to optimize J ( f ) = ∥ g − H f ∥2 :
where the solution of the problem is obtained recursively. Everybody knows that, when the forward
model operator H is singular or ill-conditioned, this iterative algorithm starts by converging, but it
may diverge easily. One of the experimental methods to obtain an acceptable approximate solution
is just to stop the iterations after K iterations. This idea can be translated to a Deep Learning NN by
using K layers. Each layer represents one iteration of the algorithm. See Figure 3.
g
?
αH t
?
?
?
?
f (1) -( I − αH t H )- + --( I − αH t H )- + - ... -( I − αH t H )- + - f (K )
J ( f ) = ∥ g − H f ∥22 + λ∥ f ∥1 , (7)
where Sθ is a soft thresholding operator and α ≤ |eig( H t H )| is the Lipschitz constant of the normal
operator. When H is a convolution operator, then:
5
• ( I − αH t H ) f (k) can also be approximated by a convolution and thus considered as a filtering
operator;
1 t
• αH g can be considered as a bias term and is also a convolution operator; and
• Sθ =λα is a nonlinear point wise operator. In particular when f is a positive quantity, this soft
thresholding operator can be compared to ReLU activation function of NN. See Figure 4.
g
?
αH t
?
f (k) - ( I − αH t H ) - + - - f ( k +1)
In all these examples, we directly could obtain the structure of the NN from the Forward model
and known parameters. However, in these approaches there are some difficulties which consist in the
determination of the structure of the NN. For example, in the first example, obtaining the structure of
B depends on the regularization parameter λ. The same difficulty arises for determining the shape and
the threshold level of the Thresholding bloc of the network in the second example. The same need of
the regularization parameter as well as many other hyper parameters are necessary to create the NN
structure and weights. In practice, we can decide, for example, on the number and structure of a DL
6
g g g
? ? ?
W0 W0 W0
bf (1-
) (K )
? ? ?
- ... - - bf
W (1) - + -
--
W (2) - + -
W (K ) - + -
Figure 6: All the K layers of DL NN equivalent to K iterations of an iterative gradient based optimiza-
tion algorithm. The simplest solution is to choose W 0 = αH and W (k) = W = ( I − αH t H ), k =
1, · · · , K. A more robust, but more costly is to learn all the layers W (k) = ( I − α(k) H t H ), k = 1, · · · , K.
network, but as their corresponding weights depend on many unknown or difficult to fix parameters,
ML may become of help. In the following we first consider the training part of a general ML method.
Then, we will see how to include the physics based knowledge of the forward model in the structure
of learning.
H † = [ H t H ] −1 H t or H † = H t [ H H t ] −1 , (9)
7
Learning Fixed Physics based ( Trainable part )
K
Data → part → ef k → b →B
B = arg min ∑ ∥ f k − ϕ( B ef k )∥2 + λR( B))
b
K
( g k , f k ) k =1 t
f k = H gk
e B k =1
Figure 7: Training (top) and Testing (bottom) steps in the first use of physics based ML approach
and Singular value decomposition (SVD) of the operators [ H t H ] and [ H H t ] give another possible de-
composition of the NN structure. Let us to note
where ∆ is a diagonal matrix containing the singular values, U and V containing the corresponding
eigenvectors. This can be used to decompose the W to four operators:
W = V ′ ∆U H t or W = H t V ∆U ′ , (11)
where three of them can be fixed and only one ∆ can be trainable. It is interesting to know that when
the forward operator H has a shift-invariant (convolution) property, then the operators U and V ′ will
correspond, respectively, to the FT and IFT operators and the diagonal elements of Λ correspond to
the FT of the impulse response of the convolution forward operator. So, we will have a fixed layer
corresponding to H t which can be interpreted as a matched filtering, then a fixed FT layer which is
a feed-forward linear network, a trainable filtering part corresponding to the diagonal elements of Λ
and a forth fixed layer corresponding to IFT. See Figure 8.
Figure 8: A four-layers NN with three physics based fixed corresponding to H t , U ′ or FT and V or IFT
layers and one trainable layer corresponding to Λ.
8
The main issue is the limited number of data input-output examples {( f , g )k , k = 1, 2, ..., K } we can
have for the training step of the network.
The scheme that we presented is general and can be extended to any multi-layer NN and DL. In
fact, if we had a great number of data-ground truth examples {( f , g )k , k = 1, 2, ..., K } with K much
more than the number of elements Wm,n of the weighting parameters W, then, we did not even have
any need for forward model H. This can be possible for very low dimensional problems [10]. But, in
general, in practice we do not have enough data. So, some prior or regularizer is needed to obtain a
usable solution.
9
Input g denoising deconv Segmentation Segmented
IR image→ C1 − Th − C2 → C3 − Thr − C4 → SegNet → image
Figure 10: The proposed 4 layers NN for denoising, deconvolution and segmentation of IR images
Figure 11: Example of expected results: First row: a simulated IR image (left), its ground truth labels
(middle) , the result of the deconvolution and segmentation (right). Second row: a real IR image (left)
and the result of its deconvolution and segmentation (right).
To train this NN, we can generate different known shaped synthetic images to consider as the
ground truth and simulate the blurring effects of temperature diffusion, via the convolution of different
appropriate point spread functions and add some noise to generate realistic images. We can also use
a black body thermal sources, for which we know the shape and the exact temperature, and acquire
different images at different conditions. All these images can be used for the training of the network.
We propose then to use a four groups of layers DL structure as it is shown in Figure 10, to train it
with one hundred images artificially generated and one hundred images obtained with a black body
experiment. Then, trained model can be used for the desired task on a test set images. Figure 11, we
show one such expected results. More details will be given in a near future paper.
10
7 Conclusions and Challenges
Classical methods for inverse problems are mainly based on regularization methods or on Bayesian
inference with a connection between them via the Maximum A Posteriori (MAP) point estimation.
The Bayesian approach gives more flexibility, in particular for determination of the regularization pa-
rameter. However, whatever deterministic or Bayesian computations still is a great problem for high
dimensional problems.
Recently, the Machine Learning (ML) methods have become a good help for some aspects of these
difficulties. Nowadays, ML, Neural Networks (NN), Convolutional NN (CNN) and Deep Learning
(DL) methods have obtained great success in classification, clustering, object detection, speech and
face recognition, etc., But, they need a great number of training data and and they may fail very easily,
in particular for inverse problems.
In fact, using only data based NN without any specific structure coming from the forward model
(Physics) may work for small size problems. However, the progress arrives via their interaction with
the model based methods. In fact, the success of CNN and DL methods greatly depends on the
appropriate choice of the network structure. This choice can be guided by the model based meth-
ods [3, 4, 10, 20–25].
In this work, we presented a few examples of such interactions. We explored a few cases: first when
the forward operator is known. Then, when we use the forward model partially or in the transform
domain. As we could see, the main contribution of ML and NN tools can be in reducing the costs of
the inversion method when an appropriate model is trained. However, to obtain a good model, there
is a need for sufficiently rich data and a good network structure obtained from the physics knowledge
of the problem in hand.
For inverse problems, when the forward models are non linear and complex, NN and DL may be of
great help. However, we may still need to choose the structure of the NN via an approximate forward
model and approximate Bayesian inversion [11, 12, 14].
References
[1] A. Mohammad-Djafari, “Inverse problems in imaging science: from classical regularization meth-
ods to state of the art bayesian methods,” in International Image Processing, Applications and Systems
Conference, pp. 1–2, Nov 2014.
[2] H. Ayasso and A. Mohammad-Djafari, “Joint ndt image restoration and segmentation using
gauss-markov-potts prior models and variational bayesian computation,” IEEE Transactions on
Image Processing, vol. 19, pp. 2265–2277, Sept 2010.
[3] Y. Zhu and N. Zabaras, “Bayesian deep convolutional encoder–decoder networks for surrogate
modeling and uncertainty quantification,” Journal of Computational Physics, 2018.
[4] M. Unser, K. H. Jin, and M. T. McCann, “A review of convolutional neural networks for inverse
problems in imaging,” ArXiv, 2017.
11
[5] I. Y. Chun, Z. Huang, H. Lim, and J. Fessler, “Momentum-net: Fast and convergent iterative neu-
ral network for inverse problems,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
pp. 1–1, 2020.
[6] Z. Fang, “A high-efficient hybrid physics-informed neural networks based on convolutional neu-
ral network,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–13, 2021.
[7] G. Ongie, A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, “Deep learning
techniques for inverse problems in imaging,” IEEE Journal on Selected Areas in Information Theory,
vol. 1, no. 1, pp. 39–56, 2020.
[8] D. Gong, Z. Zhang, Q. Shi, A. van den Hengel, C. Shen, and Y. Zhang, “Learning deep gradient
descent optimization for image deconvolution,” IEEE Transactions on Neural Networks and Learning
Systems, vol. 31, no. 12, pp. 5468–5482, 2020.
[9] S. Ren, K. Sun, C. Tan, and F. Dong, “A two-stage deep learning method for robust shape recon-
struction with electrical impedance tomography,” IEEE Transactions on Instrumentation and Mea-
surement, vol. 69, no. 7, pp. 4887–4897, 2020.
[10] A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural networks for inverse
problems in imaging: Beyond analytical methods,” IEEE Signal Processing Magazine, 2018.
[11] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics informed deep learning (part i): Data-
driven solutions of nonlinear partial differential equations,” arXiv preprint arXiv:1711.10561, 2017.
[12] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics informed deep learning (part ii): Data-
driven discovery of nonlinear partial differential equations,” arXiv preprint arXiv:1711.10566, 2017.
[13] Y. Chen, L. Lu, G. E. Karniadakis, and L. D. Negro, “Physics-informed neural networks for inverse
problems in nano-optics and metamaterials,” arXiv: Computational Physics, 2019.
[14] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learn-
ing framework for solving forward and inverse problems involving nonlinear partial differential
equations,” Journal of Computational Physics, 2019.
[15] D. Gilton, G. Ongie, and R. Willett, “Neumann networks for linear inverse problems in imaging,”
IEEE Transactions on Computational Imaging, vol. 6, pp. 328–343, 2020.
[16] K. de Haan, Y. Rivenson, Y. Wu, and A. Ozcan, “Deep-learning-based image reconstruction and
enhancement in optical microscopy,” Proceedings of the IEEE, vol. 108, no. 1, pp. 30–50, 2020.
[17] H. K. Aggarwal, M. P. Mani, and M. Jacob, “Modl: Model-based deep learning architecture for
inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2019.
[18] A. Mohammad-Djafari, “Hierarchical markov modeling for fusion of x ray radiographic data
and anatomical data in computed tomography,” in Proceedings IEEE International Symposium on
Biomedical Imaging, pp. 401–404, July 2002.
12
[19] A. Mohammad-djafari, “Regularization, bayesian inference and machine learning methods for
inverse problems,” Entropy, vol. 23, no. 12, p. 1673, 2021.
[20] T. Meinhardt, M. Moeller, C. Hazirbas, and D. Cremers, “Learning proximal operators: Using de-
noising networks for regularizing inverse imaging problems,” arXiv: Computer Vision and Pattern
Recognition, 2017.
[21] S. Vettam and M. John, “Regularized deep learning with a non-convex penalty.,” arXiv: Machine
Learning, 2019.
[22] R. Guidotti, A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti, “A survey of methods for
explaining black box models,” arXiv: Computers and Society, 2018.
[23] K. H. Jin, M. T. McCann, and M. Unser, “A review of convolutional neural networks for inverse
problems in imaging,” ArXiv, 2017.
[24] J. H. R. Chang, C.-L. Li, B. Poczos, B. V. K. V. Kumar, and A. C. Sankaranarayanan, “One net-
work to solve them all — solving linear inverse problems using deep projection models,” arXiv:
Computer Vision and Pattern Recognition, 2017.
[25] S. Mo, N. Zabaras, X. Shi, and J. Wu, “Deep autoregressive neural networks for high-dimensional
inverse problems in groundwater contaminant source identification,” arXiv: Machine Learning,
2018.
13