Step 5: Solve for α and β
We solve:
4β 4α
α= , β=
1 + 4β 2 1 + 4α2
Substituting β into α:
4α
4 ( 1+4α 2)
α=
4 ( (1+4α 2 )2 )
16α2
1+
Through algebraic simplifications, solving for α and β numerically, we obtain:
2
α=β= .
5
Thus, the optimal values of α and β that minimize the loss function are:
2 2
α= , β= .
5 5
consider a fully connected autoencoder each hidden node
is connected to all inputs and all outputs with 2
dimensional binary input and one hidden layer with tanh
activation function. At iteration t, the weights are shown
with input x1 =1 and x2 = 1. bias values are zero. learning
rate is 0.25, momentum constant is 0.75. At t - 1, w1 = -0.5,
w2 = 0.5 w3 = 0.5 and w4 = -0.5.
The below is the network
x1 -- w1 = 0 ---> h, x2 --- w2 = 1 ---> h, h --- w3 = 1 ---> o1, h ---
w4 = 0 ---> o2.
Sigmoid activation in output nodes.
I suppose we can choose binary cross entropy loss over the
MSE
What will be the value of loss function at iteration t?
4/9
what will be the weights w1 and w3 at iteration t+1
assuming backpropagation with ordinary gradient descent
is used?
note d/dx (tanh(x)) = 1-tanh^2 (x)
Provide manual calculations and python code for this
Let's go step by step through the calculations.
Step 1: Define Network Equations
The network consists of:
1. Input layer: x1 = 1, x2 = 1
2. Hidden layer: h = tanh(w1 x1 + w2 x2 )
3. Output layer: o1 = σ(w3 h), o2 = σ(w4 h)
4. Loss function: Binary Cross-Entropy (BCE) given by:
L = − ∑ y log(o) + (1 − y) log(1 − o)
At iteration t:
w 1 = 0, w 2 = 1, w 3 = 1, w 4 = 0.
The biases are zero.
The target output is assumed to be y1 = 1, y2 = 1 (since unspecified).
Step 2: Forward Pass
Hidden Layer Computation
h = tanh(w1 x1 + w2 x2 ) = tanh(0 ⋅ 1 + 1 ⋅ 1) = tanh(1)
Using tanh(1) ≈ 0.7616, we get:
h ≈ 0.7616
5/9
Output Layer Computation
o1 = σ(w3 h) = σ(1 × 0.7616) = σ(0.7616)
o2 = σ(w4 h) = σ(0 × 0.7616) = σ(0)
Using the sigmoid function:
1
σ(x) =
1 + e−x
For o1 :
1
o1 = ≈ 0.6817
1 + e−0.7616
For o2 :
1
o2 = = 0.5
1 + e0
Step 3: Compute Loss
Using binary cross-entropy loss:
L = −(y1 log o1 + (1 − y1 ) log(1 − o1 )) − (y2 log o2 + (1 − y2 ) log(1 − o2 ))
Assuming y1 = 1, y 2 = 1:
L = −(log 0.6817 + log 0.5)
L ≈ −(−0.3823 − 0.6931) = 1.0754
Step 4: Compute Gradients (Backpropagation)
Output Layer Gradients
For BCE loss:
∂L
=o−y
∂o
6/9
δ1 = o1 − y1 = 0.6817 − 1 = −0.3183
δ2 = o2 − y2 = 0.5 − 1 = −0.5
Weight gradients:
∂L
= δ1 h = (−0.3183)(0.7616) = −0.2423
∂w3
∂L
= δ2 h = (−0.5)(0.7616) = −0.3808
∂w4
Hidden Layer Gradients
Backpropagating to hidden layer:
δh = (w3 δ1 + w4 δ2 )(1 − h2 )
δh = (1 × −0.3183 + 0 × −0.5)(1 − 0.76162 )
= (−0.3183)(1 − 0.58)
= (−0.3183)(0.42) = −0.1337
Now compute weight updates:
∂L
= δh x1 = (−0.1337)(1) = −0.1337
∂w1
∂L
= δh x2 = (−0.1337)(1) = −0.1337
∂w2
Step 5: Update Weights Using Momentum
Using gradient descent with momentum:
∂L
wt = wt−1 − η + μΔwt−1
∂w
where:
η = 0.25 (learning rate)
μ = 0.75 (momentum)
Δwt−1 is the weight change from the previous iteration
7/9
We assume Δwt−1 = 0 since no past changes are given.
(t+1)
w1 = w1 − 0.25(−0.1337) = 0 + 0.0334 = 0.0334
(t+1)
w3
= w3 − 0.25(−0.2423) = 1 + 0.0606 = 1.0606
Step 6: Python Code
Now, let's implement the calculations in Python.
Autoencoder Training
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh_derivative(x):
return 1 - np.tanh(x)**2
# Given parameters
x1, x2 = 1, 1 # Input values
y1, y2 = 1, 1 # Target values
w1, w2, w3, w4 = 0, 1, 1, 0 # Weights at iteration t
learning_rate = 0.25
momentum = 0.75
prev_w1, prev_w3 = -0.5, 0.5 # Weights at iteration t-1
tanh_h = np.tanh(w1*x1 + w2*x2)
o1 = sigmoid(w3 * tanh_h)
o2 = sigmoid(w4 * tanh_h)
# Compute loss
loss = - (y1 * np.log(o1) + (1 - y1) * np.log(1 - o1)) - (y2 * np.log(o2) + (1 - y2) * n
# Compute gradients
delta1 = o1 - y1
delta2 = o2 - y2
dL_dw3 = delta1 * tanh_h
dL_dw4 = delta2 * tanh_h
delta_h = (w3 * delta1 + w4 * delta2) * tanh_derivative(w1*x1 + w2*x2)
dL_dw1 = delta_h * x1
dL_dw2 = delta_h * x2
8/9
# Update weights
w1_new = w1 - learning_rate * dL_dw1 + momentum * (w1 - prev_w1)
w3_new = w3 - learning_rate * dL_dw3 + momentum * (w3 - prev_w3)
print(f"Loss at iteration t: {loss:.4f}")
print(f"Updated w1 at iteration t+1: {w1_new:.4f}")
print(f"Updated w3 at iteration t+1: {w3_new:.4f}")
The loss at iteration t is 1.0754, and the updated weights at t + 1 are:
(t+1)
w1 = 0.0334
(t+1)
w3 = 1.0606
The provided Python code implements these calculations. Let me know if you need further
clarifications! 🚀
9/9