Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views12 pages

Final Solution

The document outlines the final exam details for the CSED105 Introduction to Artificial Intelligence course, including the exam date, time, and rules for answering questions. It consists of various sections including True/False questions, K-means algorithm problems, Variational Auto-Encoder questions, Markov property discussions, and convolution exercises. Students are required to provide justifications for their answers and the exam is closed-book with strict adherence to academic integrity.

Uploaded by

jennykim120106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

Final Solution

The document outlines the final exam details for the CSED105 Introduction to Artificial Intelligence course, including the exam date, time, and rules for answering questions. It consists of various sections including True/False questions, K-means algorithm problems, Variational Auto-Encoder questions, Markov property discussions, and convolution exercises. Students are required to provide justifications for their answers and the exam is closed-book with strict adherence to academic integrity.

Uploaded by

jennykim120106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Final Exam

CSED105 Introduction to Artificial Intelligence


Instructor: Jungseul Ok
[email protected]
TA’s: Yunjoo Lee, Jeongyeon Hwang, Seockbean Song, Youngjae Kim, Junhyuk So

10:00am - 12:00pm June. 5, 2024

Remarks

• You need to provide proper justification for your answers.

• It may be helpful to understand that I leave sufficient blank space for your answer per
problem, while, of course, you can use more if you need.

• This is a closed-book exam, where you are NOT allowed to consult any other material than
this exam paper. You cannot discuss with other students. Any violation may cause not
only grade F but also be reported to disciplinary committee.

• You can write your answers in English or Korean.

No. Score Comment


1
2 (a) (b) (c) (d) (e) (f)
3 (a) (b) (c)
4 (a) (b) (c)
5 (a) (b) (c)
6 (a) (b)
7 (a) (b) (c) (d)
Total

Name:

Student ID:

1
1 True/False Questions (2 × 12 = 24 pt)
Check whether the following statements are True or False.

(a) (True / False) Examples of unsupervised learning include clustering, generative models and
dimentionality reduction.
True

(b) (True / False) K-means clustering can be used for tasks such as image segmentation.
True

(c) (True / False) Diffusion models can be used to generate images.


True

(d) (True / False) The primary difference between a variational autoencoder (VAE) and a
traditional autoencoder (AE) is that a VAE learns a probabilistic latent space, while an AE
learns a deterministic latent space.
True

(e) (True / False) The goal of reinforcement learning is to learn the optimal policy that maxi-
mizes cumulative reward.
True

(f) (True / False) Reinforcement learning requires labeled data similar to supervised learning.
False

(g) (True / False) In computer vision, multilayer perceptrons are generally superior to convo-
lutional neural networks.
False

(h) (True / False) Recurrent neural networks are employed in natural language processing to
manage inputs of variable length.
True

(i) (True / False) Recurrent neural networks have the advantage of being parallelizable, leading
to faster processing speed.
False

(j) (True / False) Self-attention is a mechanism to compute the relationships between elements
within the same input sequence.
True

(k) (True / False) There have been two AI winters in the history of artificial intelligence research.
True

(l) (True / False) The current state of deep learning is no longer constrained by data.
False

2
2 K-means Algorithm (2 + 3 + 3 + 2 + 4 + 2 = 16 pt)
Given a dataset D = {x(i) }i=1,2,...,m of m data points in R2 , we want to partitioning them into
K clusters using K-means algorithm. Let µk ∈ R2 denote a representative point of cluster k ∈
(i) (i)
{1, 2, ..., K}. Indicating the assignment of data point x(i) by one-hot vector r(i) = (r1 , ..., rK ) ∈
(i) (i)
{0, 1}K such that K (i)
P
k=1 rk = 1 and rk = 1 only if x is assigned to cluster k, the K-means
(i)
algorithm aims at finding {r }i=1,...,m from the following optimization problem:
m X
K K
X 1 (i)
X (i)
min rk ∥x(i) − µk ∥22 subject to rk = 1 ∀i ∈ {1, ..., m} . (1)
{r(i) ∈{0,1}K },{µk ∈R2 }
i=1 k=1
2 k=1

To be specific, the K-means algorithm iterates (i) the assignment step to optimize r(i) ’s given
µk ’s; and (ii) the update step to optimize µk ’s given r(i) ’s.

(a) Suppose we want to perform K-means algorithm for K = 2, the dataset D, and initialization
of µk ’s in the followings:

D = {(−1, 2), (0, 2), (1, 2), (3, 2), (5, 2), (−1, 0), (0, 0), (1, 0), (3, 0), (5, 0)} , (2)
µ1 = (1, −1) and µ2 = (3, 0) . (3)

On Figure 1, plot the dataset D in (2) with circles (◦).

Figure 1: Grid for Problem 2(a)

3
(b) Which of the following corresponds to µk ’s that the update step finds?
(i) (i) (i)
P (i)
x(i)
P P P
i rk i rk x r
(i) µk = (i) (i) (ii) µk = i
(i) (iii) µk = P (i) (iv) µk = P i k(i)
ix
P P
i rk x i rk i rk

Describe the meaning of µk that you selected for the update step.
The update step would find µk in (iii) (+2pt), which is the centroid of the points assigned
to cluster k (+1pt).

(i) (i)
(c) Given the initialization in (3), compute the value of Σi r0 , Σi r1 after the first assignment
step. 5, 5 (+1.5pt each)

(d) Given the initialization in (3), find µ1 and µ2 , which K-means algorithm would find.
{(0,1), (4,1)} (+1pt each)

4
(e) Compute the values of loss function in (1) for (i) µk ’s in (3); and (ii) those which you find
in Problem 2(d).
(i) 27.5 (ii) 9 (+2pt for each; -1pt for a minor mistake, e.g., square rooting, wrong addition,
...)

(f) Describe two examples where the K-means algorithm can be applied. Good example with
appropriate reason (segmentation, recommendation, ... ) (+1pt each)

5
3 Variational Auto-Encoder (VAE) (3 + 3 + 3 = 9 pt)
Consider MNIST dataset D = {x(i) }m i=1 consisting of gray-scale images of the handwriting digits,
and a VAE model consisting of encoder f and decoder g, which are multilayer perceptrons. To be
specific, in training, we encode x(i) using f (x(i) ) = (µ(i) , (σ (i) )2 ) to generate 3-dimensional random
latent code z (i) ∼ N (µ(i) , (σ (i) )2 ), and decode z (i) to reconstruct x(i) using g(z (i) ) = x̂(i) , where
our learning objective can be informally described as follows:
m
1 X (i)
min ∥x − g(z (i) )∥2 +λ KL(N (µ(i) , (σ (i) )2 )∥N (0, I)) , (4)
f,g m | {z } | {z }
i=1
reconstruction error regularization

(a) Figure 2 visualizes the latent space of when the model is trained with λ values of 0 and 1.
Identify which the λ values that each latent space. (left): λ = 1 , (right) : λ = 0 (3pt for
the correct answer; 0pt if any wrong, e.g., (left) = 1, (right) = 1)

(lef t) : λ = (right) : λ =

Figure 2: Visualizations of latent space

6
(b) Describe the meanings of reconstruction error term and regularization term in (4) respec-
tively.
The reconstruction error term measures a discrepancy between original x(i) and recon-
structed x̂(i) = g(z (i) ), and the regularization term measures the distance between the
distribution of z (i) and a zero-mean normal distribution N (0, I)
Two correct description: (3pt)
One correct description: (1.5pt)
Note: Weak or ambiguous description is not allowed.

(c) Figure 3 visualizes two images of 4: x1 and x2 . Considering that N (f (x); 0, I) models the
likelihood of data x, compare the likelihood of N (f (x1 ); 0, I) and N (f (x2 ); 0, I). Justify
your answer briefly.

(a) x1 (b) x2

Figure 3: Two images of digit 4

N (f (x1 ); 0, I) > N (f (x2 ); 0, I), since x2 is a outlier data.


Correct answer and justification: (3pt)
Correct answer and no or wrong justification: (1pt)
Wrong answer regardless of justification: (0pt)

7
4 Markov Property (3 + 3 + 3 = 9 pt)
Consider a reinforcement learning task, where a robot with a local view is looking for the shortest
in the maze from start to end. In each time t = 0, 1, ..., the robot observes local view ot , and selects
an action at ∈ {N(orth), E(ast), S(outh)} to move one of neighboring white tiles in Figure 4. The
robot can perceive one of seven observations in Figure 5. To solve this problem of finding the
shortest path, we give +1 only when the robot reaches the end, and −1 otherwise as reward rt+1 .

Figure 4: An example sequence of (ot , at , rt+1 , ot+1 ).

Figure 5: All possible observations.


(a) State the definition of Markov property.
The Markov property states that the next state of a stochastic process depends only upon
the present state, and is independent of the sequence of events that preceded it, i.e.,
P (st+1 , rt+1 |ht ) = P (st+1 , rt+1 |st , at ) where ht = (s0 , a0 , r1 , ..., st , at ).

(b) Suppose that we define state st = ot at time t in agent-environment interface, i.e., (st , at , rt+1 ) =
(ot , at , rt+1 ). Justify if this definition of state verifies the Markov property.
The Markov property is not satisfied in this scenario. For example, when the robot per-
ceives st = (III) at time t, we can not predict the next state st+1 from st with at = E because
there are two cases, e.g., st+1 = (III) or st+1 = (V II). However, when we know history ht ,
we can predict the next state st+1 with action at , i.e., P (st+1 , rt+1 |ht ) ̸= P (st+1 , rt+1 |st , at ).
(+1pt for identifying no Markov property; +2pt for a proper justification:)

(c) Suppose that we define state st = ot−L:t at time t in agent-environment interface, where
L ≥ 1, i.e., (ot−L:t , at , rt+1 , ot+1−L:t+1 ) = (st , at , rt+1 , st+1 ), where ot−L:t is the stack of the
(L + 1)-most recent observations, i.e., ot−L:t := (ot′ )t′ ∈[max{0,t−L},t] . Check if there exists a
finite constant L that makes this definition of state to verify the Markov property. Justify
your answer.
The Markov property is satisfied when L ≥ 1. If we have at lest 2 recent observations
(L ≥ 1), we can predict the next state st+1 with at based solely on st , independently of the
previous history. Let’s consider that L = 1. When st = ot−1:t is ((V I), (III)), then st+1
will be ((III), (III)) with at = E. Similarly, when st = ot−1:t is ((III), (III)), then st+1
will be ((III), (V II)) with at = E. This resolves the problem (b).
Note that we do not have the action for W(est).
(+1pt for identifying Markov property; +2pt for a proper justification)

8
5 Convolution (5 + 5 + 5 = 15 pt)
In the following questions, you will practice convolution. Let X and W be two real-valued discrete
functions. The convolution of X and W are denoted by X ∗ W such that

X
(X ∗ W )[n] = X[k] W [n − k] .
k=−∞

(a) Given the following 1D input signal X[n] and two filters (kernels) which are W1 [n] and
W2 [n]:

X = [1, 2, 3, 10, 11, 12, 13]


W1 = [1, 0, −1], W2 = [0.25, 0.5, 0.25]

Perform the convolution of the input signal X with each filter W1 and W2 for n = 4, 5, .., 8.
Note 1 : Except for the given values of X[n], W1 [n], and W2 [n], all other values are 0.
Note 2 : The index of X[n], W1 [n], and W2 [n] in the given values starts from 1, such that
X[1] = 1, X[2] = 2, · · · , W1 [1] = 1, W1 [2] = 0, · · ·.

(i) X ∗ W1 :

(ii) X ∗ W2 :

(2.5pt)(i) X ∗ W1 = [2, 8, 8, 2, 2]
(2.5pt)(ii) X ∗ W2 = [2, 4.5, 8.5, 11, 12]
(1.5pt) for calculation mistake for each

(b) Explain the purpose and effect of each filter on the input signal. (hint: what kind of feature
or characteristic do W1 and W2 detect in the input signal X?)

(i) W1 :
(2.5pt)(i) W1 detects sudden changes in intensity in the input signal.
(1.5pt) other general explanation of what filter W1 does
(ii) W2 :
(2.5pt)(ii) W2 performs smoothing or blurring of the input signal. (1.5pt) other general
explanation of what filter W2 does

(c) Explain an advantage of using Convolutional Neural Networks (CNNs) over Multi-Layer
Perceptrons (MLPs) for processing image data.
(5pt)(i) Capture hierarchical features in images through convolutional layers, enabling un-
derstanding at different abstraction levels.
(ii) Share parameters across spatial locations, reducing redundancy and improving efficiency.
(iii) Exploit local connectivity and learn translation-invariant features, making them robust
to variations in object position.
(2.5pt) other general explanation of CNN or MLP but lack of reasoning

9
6 Transformer and attention mechanism (9 + 6 = 15 pt)
Transformer is a well-known neural network architecture leveraging the advantage of the attention
mechanism, of which output given query Q, key K and value V is computed as follows:

Q · KT
 
AttentionLayer(Q, K, V ) = softmax ·V
scaling

(a) Complete the following illustration of the attention mechanism by filling the six blanks with
terms in {softmax, K, V, dot-product, scaling}. (Note: a term can appear multiple times.)

(i):

(ii):

(iii):

(iv):

(v):

(vi):

(1.5pt) (i) : K
(1.5pt) (ii) : V
(1.5pt) (iii) : dot-product
(1.5pt) (iv) : scaling
(1.5pt) (v) : softmax
(1.5pt) (vi) : dot-product

(b) Describe the purpose of positional encoding in the transformer architecture for natural
language processing.
(5pt) To provide the sequential order information of the input sequence to the model. / To
enable the model to understand contextual information of the input sequence.

10
7 Limitations of AI (3 + 3 + 3 + 3 = 12 pt)
Although artificial intelligence (AI) has achieved remarkable advancements, the journey has not
always been straightforward.

(a) Explain what the universal approximation theorem is.

It states that a neural network with a single layer is sufficient to approximate to any function
at an arbitrary precision.

(b) Explain the limitations of the universal approximation theorem.

It just guarantees the existence of such a network without giving way to obtain it.
However, it does not mean we can find such a neural network and there is no guarantee on
the generalization ability, i.e., the neural network can be just a memorizer of dataset.

(c) Explain a lesson learned from the AI winter periods in terms of understanding the limitations
of AI technologies.

The AI community learned that over-promising results can lead to unrealistic expectations
and subsequent disillusionment. Early AI pioneers were overly optimistic about the poten-
tial for rapid advancements and the timelines for achieving human-level intelligence. This
mismatch between promises and actual achievements led to a loss of credibility and support
from both the public and funding agencies.

11
(d) Discuss a limitation of the current large language model.

Hallucination, lack of reasoning, heavy computation requirements.

12

You might also like