Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
27 views8 pages

Sci ML Mock Exam 2023

TUM ML exam mock

Uploaded by

as434
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views8 pages

Sci ML Mock Exam 2023

TUM ML exam mock

Uploaded by

as434
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to Scientific Machine Learning for Engineers

Mock Exam WS23/24

Artur Toshev
[email protected]

December 22, 2023


Contents

1 Theoretical Part 2
1.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Support Vector Machine Decision Boundary . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Connecting Gradients and Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Maximum Likelihood Estimation I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Maximum Likelihood Estimation II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Gaussian Process Classification and the Influence of Kernels . . . . . . . . . . . . . . . . 5
1.8 Backprop of MLP with one Hidden Layer . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Practical Part 6
2.1 Predictive Models: Mauna Loa Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Bayesian Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Connection between Bayesian Linear Regression and Gaussian Processes . . . . . 6
2.1.3 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Classification on the OMNI2 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1
Chapter 1

Theoretical Part

Disclaimer: The theoretical part of the exam will contain around 45 Pts ( 1 Pt / 1 min), whereas here
only 38 Pts are provided.

1.1 Linear Regression


1. [5 Pts] We assume to collect a set of 100 data points containing a predictor and response each,
i.e. relating X to Y . We then fit a linear regression model to the data, as well as a separate
cubic regression Y = β0 + β1 X + β2 X 2 + β3 X 3 + ϵ
(a) [1 Pts] Suppose that the true relationship between X and Y is linear, i.e. Y = β0 + β1 X + ϵ.
Consider the training residual sum of squares for the linear regression, and also the training
residual sum of squares for the cubic regression. Would we expect one to be lower than the
other, would we expect them to be the same, or is there not enough information to tell?
Please elaborate on your justification.
(b) [1 Pts] Answer question (a) using the test rather than the training residual sum of squares.
(c) [1 Pts] Suppose that the true relationship between X and Y is not linear, but we don’t
know how far it is from linear. Consider the training residual sum of squares for the linear
regression, and also the training residual sum of squares for the cubic regression. Would we
expect one to be lower than the other, would we expect them to be the same, or is there
not enough information to tell? Please elaborate on your justification.
(d) [2 Pts] Answer (c) using the test rather than the training residual sum of squares.
Recall that the formulas for the residual sum of squares is given by:
n
X
RSS = (yi − ŷi )2
i=1

2
1.2 Support Vector Machine Decision Boundary
2. [3 Pts] For each of the three decision boundaries examine which classifier could have generated
the boundary and justify your choice. The considered options are:

Figure 1.1: (1) Figure 1.2: (2) Figure 1.3: (3)

• Linear Support Vector Machine


• Logistic Regression
• 1-layer Neural Network with 1 hidden layer
• None of the above

1.3 Logistic Regression


3. [3 Pts] Presume, we are employing logistic regression applied to a 2-D samples, i.e. xi ∈ R2 , into
two target classes yi ∈ [0, 1]. Our regression function is given by the typical logistic regression
function
h(z) = s(wT z + a) (1.1)
with s being the logistic function. As the regression function is unable to accurately fit the data
we resort to the introduction of a further feature ||xi ||2 , which is appended to each sample xi .
What form could your decision boundary then take?

• Line
• Circle
• Ellipse
• S-shaped Logistic Curve

3
1.4 Connecting Gradients and Gradient Descent
4. [7 Pts] Let ϕ(x) : R −→ Rd , w ∈ Rd . Consider the following objective function

1 − 2(w · ϕ(x))y
 if (w · ϕ(x))y ≤ 0
2
Loss(x, y, w) = (1 − (w · ϕ(x))y) if 0 < (w · ϕ(x))y ≤ 1

0 if (w · ϕ(x))y > 1,

where y ∈ R.
(a) [2 Pts] Compute the gradient ∇w Loss(x, y, w).
(b) [3 Pts] Let d = 2 and ϕ(x) = [1, x]. Consider the following loss function
1
TrainLoss(w) = (Loss(x1 , y1 , w) + Loss(x2 , y2 , w)) .
2
Compute ∇w TrainLoss(w) for the following values
 
1
w = 0, , x1 = −2, y1 = 1, x2 = −1, y2 = −1.
2

Conclude by writing down the gradient descent update rule for the TrainLoss loss function.
(c) [2 Pts] Perform two iterations of gradient descent to minimize the TrainLoss  objective
function with values for x1 , y1 , x2 , y2 as above. Use the initialization w0 = 0, 21 and step


size η = 12 .

1.5 Maximum Likelihood Estimation I


5. [5 Pts] Suppose we are testing a number of new sensors sampled from a population of similar
sensors. The target is to estimate the mean failure time of the population. To do so, we assume
an exponential distribution, i.e.

PDF: f (x) = λe−2λx (1.2)


Z x
CDF: F (x) = f (x) dx = 1 − e−λx (1.3)
0

(a) [3 Pts] Running all sensors until failure t1 , t2 , . . . , tn , we want to formulate the likelihood
function L(λ; t1 , . . . , tn ) for the data
(b) [2 Pts] Find the distributions maximum likelihood estimate λ̂ for the distribution’s param-
eter.

1.6 Maximum Likelihood Estimation II


6. [3 Pts] Use the maximum likelihood estimation approach, as shown in the lecture for the normal
distribution, to find the maximized estimate for the λ in this modified distribution under n
random samples from its distribution with the probability density function
3
λx e−16λ
f (x) = , x = 0, 1, 2, . . .
x!

4
1.7 Gaussian Process Classification and the Influence of Ker-
nels
7. [5 Pts] In Gaussian process classification we need to use a local Laplace approximation to assign
an approximate Gaussian posterior at the training points and satisfy the conditions of a Gaussian
process. Yet, we need our data to be in the reproducing kernel hilbert space for it to be modelable
by our Gaussian process. Two famously popular choices are the radial basis function kernel and
the periodic kernel
(x − x′ )2
 
′ 2
kRBF (x, x ) = σ exp − , (1.4)
2l2

  
2 sin2 π |x−xp
|

kPer (x, x′ ) = σ 2 exp − . (1.5)


l2

In which cases should you prefer each one? Please justify your reasoning.

1.8 Backprop of MLP with one Hidden Layer


8. [7 Pts] Consider the following classification MLP with one hidden layer on which you will perform
backpropagation:

x = input ∈ RD ,
z = Wx + b1 ∈ RK ,
h = ReLU(z) ∈ RK ,
a = Vh + b2 ∈ RC ,
L = CrossEntropy(y, S(a)) ∈ R

where x ∈ RD , d1 ∈ RK , W ∈ RK×D , b2 ∈ RC , V ∈ RC×K , where D is the size of the input, K


is the number of hidden units, and C is the number of classes. Think of this classification MLP
as a feedforward model in the following sense

L = f4 ◦ f3 ◦ f2 ◦ f1 ,

where the cross-entropy loss is given by


X
CrossEntropy(y, S(a)) = − {yc′ ln(S(ac′ )) + (1 − yc′ ) ln(1 − S(ac′ )} ,
c′ ∈C

and S is the softmax activation function


exc
S(x) = PC .
c′ =1 e x c′

Rely on the principles of backpropagation throughout (a) and (b).


(a) [3 Pts] Calculate the gradients for the inputs and parameters, i.e.

∇V L, ∇W L, ∇x L

(b) [4 Pts] Calculate the gradients of the loss w.r.t. the two layers, i.e.

∇a L, ∇z L

5
Chapter 2

Practical Part

Disclaimer: The practical part of the exam will contain 60 Pts, whereas here only 40 Pts are provided.

2.1 Predictive Models: Mauna Loa Dataset


[20 Pts] In this question, we will work with the Mauna Loa dataset, which contains atmospheric carbon
dioxide (CO2) concentrations derived from the Scripps Institute of Oceanography’s continuous mon-
itoring program at Mauna Loa Observatory Hawaii between 1958 and 1993. The dataset is available
on Moodle.

2.1.1 Bayesian Linear Regression


[4 Pts] Write down and describe the equations involved in the Bayesian approach to linear regression,
which may be used to describe an underlying hidden function of time given noisy data points. You
may assume an independent Gaussian prior over each of the weights and independent noise on each
data point.

2.1.2 Connection between Bayesian Linear Regression and Gaussian Pro-


cesses
[4 Pts] By considering the covariance of the function in (2.1) at two time points, show how we can
derive an equivalent kernel function using an inner product of basis functions. Hence show that the
Bayesian linear regression approach is equivalent to the Gaussian process approach with a suitable
choice of kernel function.

2.1.3 Gaussian Processes


[12 Pts] Using a Gaussian process, construct a predictive model of CO2 emissions over the subsequent
20 years. Give full details of any assumptions and equations you make use of when fitting your model
to the data.

2.2 Classification on the OMNI2 dataset


[20 Pts] Solar winds can be distinguished into four distinct categories: ejecta, coronal hole origin
plasma, streamer belt origin plasma, and sector reversal origin plasma. The correct classification of
the different solar winds is up to this day of great interest in astronomy. Here we have a database
with about 300k hours of labeled solar wind data calculated from OMNI2 for the years 1965-2007 and
as published by E. Camporeale, A. Care, J. Borovsky in Classification of Solar Wind with Machine
Learning in the Journal of Geophysical Research.

6
2.2.1 Model selection
[4 Pts] Argue which classification model makes sense for such an application. Do not consider the
Gaussian process classification laid out in the respective research paper.

2.2.2 Implementation
[12 Pts] Implement and train your previously described model. Apply and compare different tricks of
optimization.

2.2.3 Performance Analysis


[4 Pts] Compare your trained classifier’s performance against the classifier of the references paper and
evaluate how the performance and characteristics compare.

You might also like