0% found this document useful (0 votes)

7 views11 pages

PDF Hyperparameter Tuning Batch Normalization

Uploaded by

Mai Gado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

PDF Hyperparameter Tuning Batch Normalization

Uploaded by

Mai Gado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Constant Learning Rate

Until now, throughout the training process, the learning rate remained constant.

When the path taken by gradient descent is approaching minimum, it might bypass the minimum just because
the gradient is large.

This will take additional steps to return to the minimum.

When learning rate is constant, the gradient descent tends to oscillate when it is about to converge.

The training can start with a large learning rate since randomized weights will be far from the optimal. At later
stages, the learning rate can be decreased to allow more fine-grained weight updates.

Batch Normalization
From the previous course Building Effective Deep Neural Network, you have seen how normalizing inputs
helps to train network faster.

This concept can be extended to each layer of the network by normalizing the output of the previous layer
before feeding to the next layer. This technique is known as batch normalization.

Covariant Shift
The covariant shift is one of the major problems solved by batch normalization.

For example, if you have trained a network to identify human faces using grayscale images and then if you
test your model on colored images, the network might not perform well since there is a large difference in
pixel values between the train and test data.

This problem is known as covariant shift where there is a shift in the data distribution, but the ground truth
remains the same.

Internal Covariant Shift

Covariant shifts can also happen between the layers of the network when data flows across the layers.

In the case of mini batch gradient descent, since each batch is made of a set of random samples the current
batch might have a different distribution compared to the previous mini batch.

This change in distribution might reflect in the output of subsequent layers. Otherwise, when the parameters
of previous layers get updated, it also changes the input distribution for the current layer.

In batch normalization, each layer makes sure that its input distribution remains the same by normalizing
before performing activation.

For current mini-batch

How it Works?

The equations shown in the previous card do the following operations:

Calculate the mean of the minibatch.

Calculate variance of the minibatch.

Calculate norm by subtracting the mean from Z and dividing by the standard deviation .A
small number, epsilon , is added to the denominator to prevent divide by zero. Now the distribution has
zero mean and unit variance.

Calculate by multiplying norm with a scale and adding a shift and use in
place of Z as the nonlinearity (e.g. ReLU’s) input. The two parameters and are learned during the
training process with parameters W and b.

Parameters vs Hyperparameters

Till now, you have come across several parameters and hyperparameters.

Parameters are the one that is learned by the network by performing gradient descent.

Weights W, bias b, scaling parameters and (the one which you learned in batch normalization) are the
parameters that you initialize randomly and leave it for the network to learn.

Hyperparameters, on the other hand, cannot be learned from the data but has to be tried on different values
within some range till we get the model right.

Hyperparameters

Some of the important hyperparameters you have learned so far are:

learning rate

parameter for the gradient with momentum

number of nodes in each layer

number of layers

mini-batch size

, and with respect to adam optimizer

Selecting Hyperparameters

When training the model, you have to try on various combinations of parameters and come up with one set of
parameters on which the model performs its best.

Grid Search In a grid search, you will arrange the parameters in the form of a matrix and try out each
combination to train your model.

In a real scenario, when trying out more than two hyperparameters and matrix can be multidimensional.
The main problem with this approach is that you will end up training the number of models having the same
accuracy.

This works well when the number of hyperparameters is small.

Random Search In this approach instead of iterating through each of the combinations, we randomly select a
limited number of combination to train our model.

Though you are not iterating over all possible combination the chances of selecting the best combination will
be high.

This is helpful when you have a large number of parameters to tune.

Choosing Appropriate Scale

Hyperparameters like a number of nodes, a number of layers can be searched on a linear scale since their
range is very small.

The model's performance is sensitive for a small change in values or hence searching for them on
linear sale would be a bad idea.

GD with momentum: to avoid too many oscillations in the path taken by gradient descent.

RmsProp: a technique to have a balanced step size - decreases the step size in case of a larger gradient and
increases the step size for vanishing gradient.

Adam optimizer: an algorithm to combine the elements of momentum and RMSProp.

Batch normalization: to prevent covariant shift by normalizing inputs of activation.

Hyperparameter tuning: methods to search for optimal hyperparameters.

Learning rate decay: how to have control over the learning rate to prevent large gradient steps at during

Welcome to the first handson!!!

In this handson you will be building an deep neural network network by integrating batch normalization
You will also be implementing minibatch gradient and L2 regularization to train you network
Follow the instruction provided for cell to write the code in each cell.
Run the below cell for to import necessary packages to read and visualize data

In [1]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors

The data is provided as file named 'data.csv'.

Using pandas read the csv file and assign the resulting dataframe to variable 'data'
for example if file name is 'xyz.csv' read file as pd.read_csv('xyz.csv')
In [2]: data = pd.read_csv('../input/data (1).csv')
data.head()

Out[2]:
feature1 feature2 target

0 -0.260842 0.965382 0.0

1 0.880000 0.000000 1.0

2 -0.942991 -0.332820 0.0

3 0.309017 0.951057 0.0

4 -0.691934 -0.543716 1.0

Extract feature1 and feature2 values from dataframe 'df' and assign it to variable 'X'
Extract target variable 'traget' and assign it to variable 'y'.
Hint:
Use .values to exract values from dataframe

In [3]: X = data.loc[:, data.columns != 'target'].values

y = data['target'].values

Run the below cell to visualize the data in x-y plane. (visualization code has been written for you)
The green spots corresponds to target value 0 and blue spots corresponds to target value 1

In [4]: colors=['green','blue']
cmap = matplotlib.colors.ListedColormap(colors)
#Plot the figure
plt.figure()
plt.title('Non-linearly separable classes')
plt.scatter(X[:, 0], X[:, 1], marker='o', c=y, cmap=cmap,
s=25, edgecolor='k')
plt.show()
In [5]: from pandas.plotting import scatter_matrix
%matplotlib inline
color_wheel = {0: "#0392cf",
1: "#7bc043",
}

colors_mapped = data["target"].map(lambda x: color_wheel.get(x))

axes_matrix = scatter_matrix(data.loc[:, data.columns != 'target'], alpha =

0.2, figsize = (10, 10), color=colors_mapped )

In order to feed the network the input has to be of shape (number of features, number of samples)
and target should be of shape (1, number of samples)
Transpose X and assign it to variable 'X_data'
reshape y to have shape (1, number of samples) and assign to variable 'y_data'
In [6]: X_data = X.T
y_data = y.reshape(1, -1)

assert X_data.shape == (2, 1000)

assert y_data.shape == (1, 1000)

Define the network dimension to have two input features, four hidden layers with 20 nodes each, one output
node at final layer.

In [7]: layer_dims = [2, 20, 20, 20, 20, 1]

Import tensorflow as tf

In [8]: import tensorflow as tf

Define a function named placeholders to return two placeholders one for input data as A_0 and one for output
data as Y.

Set the datatype of placeholders as float32

parameters - num_features
Returns - A_0 with shape (num_feature, None) and Y with shape(1,None)

In [9]: def placeholders(num_features):

A_0 = tf.placeholder( shape=([num_features, None]), dtype=tf.float32)

Y = tf.placeholder(shape=([1,None]), dtype=tf.float32)

return A_0,Y

define function named initialize_parameters_deep() to initialize weights and bias for each layer

Use tf.get_variable to initialise weights and bias, set datatype as float32

Make sure you are using xavier initialization for weigths and initialize bias to zeros
Parameters - layer_dims
Returns - dictionary of weights and bias
In [10]: def initialize_parameters_deep(layer_dims):
tf.set_random_seed(1)
L = len(layer_dims)
parameters = {}
for l in range(1,L):

parameters['W' + str(l)] = tf.get_variable('W'+ str(l), shape=([lay

er_dims[l], layer_dims[l-1]]), dtype=tf.float32,
initializer=tf.contrib.la
yers.xavier_initializer())
parameters['b' + str(l)] = tf.get_variable('b'+ str(l), shape=([lay
er_dims[l], 1]), dtype=tf.float32, initializer=tf.zeros_initializer())

return parameters

Define functon named linear_forward_prop() to define forward propagation for a given layer.

parameters: A_prev(output from previous layer), W(weigth matrix of current layer), b(bias vector for
current layer),activation(type of activation to be used for out of current layer)
returns: A(output from the current layer)
Use relu activation for hidden layers and for final output layer return the output unactivated i.e if activation
is sigmoid
After computing linear output Z implement batch normalization before feeding to activation function, set
traing = True and axis = 0

In [11]: def linear_forward_prop(A_prev,W,b, activation):

Z = tf.add(tf.matmul( W, A_prev), b)
#call linear_fowrward prop
Z = tf.layers.batch_normalization(inputs=Z, axis=0, training=True, gamm
a_initializer=tf.ones_initializer(),
beta_initializer=tf.zeros_initializer
())
#implement batch normalization on Z

if activation == "sigmoid":
A = Z
elif activation == "relu":
A = tf.nn.relu(Z)
return A

Define forward propagation for entire network as l_layer_forward()

Parameters: A_0(input data), parameters(dictionary of weights and bias)

returns: A(output from final layer)
In [12]: def l_layer_forwardProp(A_0, parameters):
A = A_0
L = len(parameters)//2
for l in range(1,L):
A_prev = A

A = linear_forward_prop(A_prev, parameters['W' + str(l)], parameter

s['b' + str(l)], activation='relu' )
#call linear forward prop with relu activation
A = linear_forward_prop(A, parameters['W' +str(L)], parameters['b' + st
r(L)], activation='sigmoid')
#call linear forward prop with sigmoid activation

return A

Define the cost function

parameters:
Z_final: output fro final layer
Y: actual output
parameters: dictionary of weigths and bias
regularization : boolean
lambd: regularization parameter
First define the original cost using tensoflow's sigmoid_cross_entropy function
If regularization == True add regularization term to original cost function

In [13]: def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):

cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=Z_final,labels=Y)
if regularization:
reg_term = 0
L = len(parameters)//2
for l in range(1,L+1):

reg_term += tf.nn.l2_loss(parameters['W'+str(l)]) #
add L2 loss term

cost = cost + (lambd/2) * reg_term

return tf.reduce_mean(cost)

Define the function to generate mini-batches.

In [14]: import numpy as np
def random_samples_minibatch(X, Y, batch_size, seed = 1):
np.random.seed(seed)

m = X.shape[1] #number of sampl

es
num_batches =int( m / batch_size) #numb
er of batches derived from batch_size

indices = np.random.permutation(m) # g
enerate ramdom indicies
shuffle_X = X[:,indices]
shuffle_Y = Y[:,indices]
mini_batches = []

#generate minibatch
for i in range(num_batches):
X_batch = shuffle_X[ :, i * batch_size:(i+1) * batch_size]
Y_batch = shuffle_Y[ :, i * batch_size:(i+1) * batch_size]

assert X_batch.shape == (X.shape[0], batch_size)

assert Y_batch.shape == (Y.shape[0], batch_size)

mini_batches.append((X_batch, Y_batch))

#generate batch with remaining number of samples

if m % batch_size != 0:
X_batch = shuffle_X[ :, (num_batches * batch_size): ]
Y_batch = shuffle_Y[:, (num_batches * batch_size): ]
mini_batches.append((X_batch, Y_batch))
return mini_batches

Define the model to train the network using minibatch

parameters:
X_train, Y_train: input and target data
layer_dims: network configuration
learning_rate
num_iter: number of epoches
mini_batch_size: number of samples to be considered in each minibatch
return: dictionary of trained parameters
In [15]: def model_with_minibatch(X_train,Y_train, layer_dims, learning_rate,num_ite
r, mini_batch_size):
tf.reset_default_graph()
num_features, num_samples = X_train.shape

A_0, Y = placeholders(num_features)
#call placeholder function to initialize placeholders A_0 and Y
parameters = initialize_parameters_deep(layer_dims)
#Initialse Weights and bias using initialize_parameters
Z_final = l_layer_forwardProp(A_0, parameters)
#call the function l_layer_forwardProp() to define the final output

cost = final_cost(Z_final, Y , parameters, regularization = True)

#call the final_cost function with regularization set TRUE

#use adam optimization to train the network

train_net = tf.train.AdamOptimizer(learning_rate, beta1=0.9, beta2=0.99
9).minimize(cost)

seed = 1
num_minibatches = int(num_samples / mini_batch_size)
init = tf.global_variables_initializer()
costs = []
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_iter):
epoch_cost = 0

mini_batches = random_samples_minibatch(X_train, Y_train, mini

_batch_size, seed)
#call random_sample_minibatch to return minibatches

seed = seed + 1

#perform gradient descent for each mini-batch

for mini_batch in mini_batches:

X_batch, Y_batch = mini_batch

#assign minibatch

_,mini_batch_cost = sess.run([train_net, cost], feed_dict=

{A_0: X_batch, Y: Y_batch})

epoch_cost += mini_batch_cost/num_minibatches
if epoch % 2 == 0:
costs.append(epoch_cost)
if epoch % 100 == 0:
print(epoch_cost)
with open("output.txt", "w+") as file:
file.write("%f" % epoch_cost)
plt.ylim(0 ,2, 0.0001)
plt.xlabel("epoches per 2")
plt.ylabel("cost")
plt.plot(costs)
plt.show()
params = sess.run(parameters)
return params

train the model using the above defined function

Use X_data and y_data as training input, learning rate = 0.001, numiteration = 1000
minibatch size = 256
Return the trained parameters to variable parameters

In [16]: parameters = model_with_minibatch(X_data,y_data, layer_dims, learning_rate

=0.001,num_iter=1000, mini_batch_size=256)

1.0600777665774028
0.3384200731913249
0.2255556285381317
0.1712941179672877
0.1367433468500773
0.10704284906387329
0.08675921087463698
0.06883981203039488
0.05524631341298421
0.04730481530229251

In [ ]:

CS230
No ratings yet
CS230
101 pages
Pre-Hiring, Hiring, and Post-Hiring
0% (1)
Pre-Hiring, Hiring, and Post-Hiring
11 pages
Radiod Master
0% (1)
Radiod Master
149 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Lecture 2
No ratings yet
Lecture 2
31 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
No ratings yet
Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
1 page
6 Batchnorm
No ratings yet
6 Batchnorm
30 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
DL Lecture 11 Optimizers
No ratings yet
DL Lecture 11 Optimizers
41 pages
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
No ratings yet
Improving ML, DL Networks Hyperparameter Tuning, Regularization & Optimization
16 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
7 CNN 3
No ratings yet
7 CNN 3
30 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Deep Learning Tutorial for Business
No ratings yet
Deep Learning Tutorial for Business
58 pages
Stochastic Gradient Descent Guide
No ratings yet
Stochastic Gradient Descent Guide
61 pages
CNN 02 Batch Normalization
No ratings yet
CNN 02 Batch Normalization
19 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
155 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
Deep Learning & Neural Networks Guide
No ratings yet
Deep Learning & Neural Networks Guide
87 pages
Training NNs
No ratings yet
Training NNs
34 pages
CNN Training Aspects Presentation
No ratings yet
CNN Training Aspects Presentation
26 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Cours 6
No ratings yet
Cours 6
26 pages
Fixing Neural Network Course 2 1659759284
No ratings yet
Fixing Neural Network Course 2 1659759284
30 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
Unit 3
No ratings yet
Unit 3
110 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
DL Unit-3
No ratings yet
DL Unit-3
10 pages
DL Mod2
No ratings yet
DL Mod2
152 pages
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
No ratings yet
Machine Learning (ML) :: Aim: Analysis and Implementation of Deep Neural Network. Definitions
6 pages
Dis4 Sol
No ratings yet
Dis4 Sol
10 pages
Normalization Techniques
No ratings yet
Normalization Techniques
23 pages
Deep Learning Optimization Guide
100% (1)
Deep Learning Optimization Guide
105 pages
CNN13 7 25
No ratings yet
CNN13 7 25
175 pages
Manual - Deep Learning Lab.
No ratings yet
Manual - Deep Learning Lab.
43 pages
DL-Lecture-10 Deep Learning Experiments
No ratings yet
DL-Lecture-10 Deep Learning Experiments
15 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
DeepLearningLab Manual
No ratings yet
DeepLearningLab Manual
21 pages
Neural Networks: A Beginner's Guide
No ratings yet
Neural Networks: A Beginner's Guide
23 pages
Computer Vision NN Architecture
No ratings yet
Computer Vision NN Architecture
19 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
Assignment 13 Modern AI
No ratings yet
Assignment 13 Modern AI
3 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
CNN-Based Gender Classification Guide
No ratings yet
CNN-Based Gender Classification Guide
7 pages
Cours 4
No ratings yet
Cours 4
30 pages
DL Unit 1
No ratings yet
DL Unit 1
9 pages
Deep Learning Basics (Lecture Notes) : Romain Tavenard
No ratings yet
Deep Learning Basics (Lecture Notes) : Romain Tavenard
49 pages
Quiz 2
No ratings yet
Quiz 2
1 page
Quiz For Transfer-Learning
No ratings yet
Quiz For Transfer-Learning
3 pages
Quiz 3
No ratings yet
Quiz 3
1 page
Mutlti Object Detecting
No ratings yet
Mutlti Object Detecting
13 pages
Model Building and Validation
No ratings yet
Model Building and Validation
75 pages
TC
No ratings yet
TC
2 pages
Pic Linkedin
No ratings yet
Pic Linkedin
2 pages
5025 Predicting Parameters in Deep Learning
No ratings yet
5025 Predicting Parameters in Deep Learning
9 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
202 pages
Probability Interview Qes.
No ratings yet
Probability Interview Qes.
11 pages
Pointers 123
No ratings yet
Pointers 123
2 pages
PHYPER
No ratings yet
PHYPER
3 pages
PTI3
No ratings yet
PTI3
3 pages
String 1
No ratings yet
String 1
2 pages
Functions 3
No ratings yet
Functions 3
5 pages
String 5
No ratings yet
String 5
4 pages
CV Udacity
No ratings yet
CV Udacity
13 pages
String 2
No ratings yet
String 2
2 pages
Functions 1
No ratings yet
Functions 1
4 pages
Pointers 6
No ratings yet
Pointers 6
2 pages
Test and Train Practice Test B2 First Listening Answer Sheet
No ratings yet
Test and Train Practice Test B2 First Listening Answer Sheet
2 pages
Exp 3
No ratings yet
Exp 3
4 pages
Pointers 3
No ratings yet
Pointers 3
2 pages
Bitwise Manipulation
No ratings yet
Bitwise Manipulation
5 pages
FP 1
No ratings yet
FP 1
2 pages
Functions 10
No ratings yet
Functions 10
3 pages
Const Pointer 1
No ratings yet
Const Pointer 1
2 pages
Bitwise 4
No ratings yet
Bitwise 4
3 pages
Const 1
No ratings yet
Const 1
3 pages
MMD M Beispielfragen Eignungsverfahren
No ratings yet
MMD M Beispielfragen Eignungsverfahren
6 pages
Estimating CEFR Reading Comprehension Text Complexity: The Language Learning Journal
No ratings yet
Estimating CEFR Reading Comprehension Text Complexity: The Language Learning Journal
13 pages
The Effects of Poor Reading Comprehension On The Academic Performance of Grade 11 Students at Electron Collage Technical Education
No ratings yet
The Effects of Poor Reading Comprehension On The Academic Performance of Grade 11 Students at Electron Collage Technical Education
11 pages
Practicum Portfolio: Ana M. Alberte Teacher I
No ratings yet
Practicum Portfolio: Ana M. Alberte Teacher I
21 pages
Adobe Solution Partner Program Datasheet
No ratings yet
Adobe Solution Partner Program Datasheet
2 pages
Calculation of Co Attainment: Internal Assessment Tests
No ratings yet
Calculation of Co Attainment: Internal Assessment Tests
3 pages
Resource Mobilization
No ratings yet
Resource Mobilization
14 pages
Instant Download Research Methods in Second Language Psycholinguistics 1st Edition Jill Jegerski PDF All Chapter
100% (13)
Instant Download Research Methods in Second Language Psycholinguistics 1st Edition Jill Jegerski PDF All Chapter
66 pages
Intonation System. Tench
No ratings yet
Intonation System. Tench
11 pages
INVITATION For Speaker
No ratings yet
INVITATION For Speaker
3 pages
Learn Words About A New Subject
No ratings yet
Learn Words About A New Subject
20 pages
Sample Q
No ratings yet
Sample Q
5 pages
Matrix On Strategic Plan For Customs Development (SPCD)
No ratings yet
Matrix On Strategic Plan For Customs Development (SPCD)
4 pages
Study Skills Presentation: 1 Year
No ratings yet
Study Skills Presentation: 1 Year
24 pages
Assessing Quality of Education: in Perspective With Continuous Assessment and Learners' Performance in Adwa College, Ethiopia
No ratings yet
Assessing Quality of Education: in Perspective With Continuous Assessment and Learners' Performance in Adwa College, Ethiopia
11 pages
m8l18 PDF
No ratings yet
m8l18 PDF
25 pages
English 3
100% (1)
English 3
5 pages
Think l5 Unit 3 Vocabulary Extension
100% (1)
Think l5 Unit 3 Vocabulary Extension
2 pages
Cognitive Learning Strategies Guide
No ratings yet
Cognitive Learning Strategies Guide
1 page
CLAT 2023 UG Provisional 2nd List
No ratings yet
CLAT 2023 UG Provisional 2nd List
6 pages
Lec 13-Power Series
No ratings yet
Lec 13-Power Series
63 pages
Q Skill-1-Reading Final Test
100% (1)
Q Skill-1-Reading Final Test
4 pages
Students Embrace 'Bored and Brilliant'
100% (1)
Students Embrace 'Bored and Brilliant'
2 pages
Alumni Admissions Essays
No ratings yet
Alumni Admissions Essays
10 pages
Multicultural Identity and Ecocentrism
No ratings yet
Multicultural Identity and Ecocentrism
13 pages
GMAT Focus Edition: A Complete Guide: Updated On Jul 2, 2024 22:36 IST
No ratings yet
GMAT Focus Edition: A Complete Guide: Updated On Jul 2, 2024 22:36 IST
7 pages
Unit 12
No ratings yet
Unit 12
44 pages
Phrases and Clauses PDF
No ratings yet
Phrases and Clauses PDF
14 pages
Effective and Ineffective Supervision
No ratings yet
Effective and Ineffective Supervision
20 pages

PDF Hyperparameter Tuning Batch Normalization

Uploaded by

PDF Hyperparameter Tuning Batch Normalization

Uploaded by

Constant Learning Rate

This will take additional steps to return to the minimum.

Internal Covariant Shift

For current mini-batch

The equations shown in the previous card do the following operations:

Calculate the mean of the minibatch.

Calculate variance of the minibatch.

Some of the important hyperparameters you have learned so far are:

parameter for the gradient with momentum

number of nodes in each layer

, and with respect to adam optimizer

This works well when the number of hyperparameters is small.

This is helpful when you have a large number of parameters to tune.

Choosing Appropriate Scale

Adam optimizer: an algorithm to combine the elements of momentum and RMSProp.

Batch normalization: to prevent covariant shift by normalizing inputs of activation.

Hyperparameter tuning: methods to search for optimal hyperparameters.

Welcome to the first handson!!!

In [1]: import pandas as pd

The data is provided as file named 'data.csv'.

0 -0.260842 0.965382 0.0

1 0.880000 0.000000 1.0

2 -0.942991 -0.332820 0.0

3 0.309017 0.951057 0.0

4 -0.691934 -0.543716 1.0

In [3]: X = data.loc[:, data.columns != 'target'].values

colors_mapped = data["target"].map(lambda x: color_wheel.get(x))

axes_matrix = scatter_matrix(data.loc[:, data.columns != 'target'], alpha =

assert X_data.shape == (2, 1000)

In [7]: layer_dims = [2, 20, 20, 20, 20, 1]

In [8]: import tensorflow as tf

Set the datatype of placeholders as float32

In [9]: def placeholders(num_features):

A_0 = tf.placeholder( shape=([num_features, None]), dtype=tf.float32)

Use tf.get_variable to initialise weights and bias, set datatype as float32

parameters['W' + str(l)] = tf.get_variable('W'+ str(l), shape=([lay

In [11]: def linear_forward_prop(A_prev,W,b, activation):

Define forward propagation for entire network as l_layer_forward()

Parameters: A_0(input data), parameters(dictionary of weights and bias)

A = linear_forward_prop(A_prev, parameters['W' + str(l)], parameter

Define the cost function

In [13]: def final_cost(Z_final, Y , parameters, regularization = False, lambd = 0):

cost = cost + (lambd/2) * reg_term

Define the function to generate mini-batches.

m = X.shape[1] #number of sampl

assert X_batch.shape == (X.shape[0], batch_size)

#generate batch with remaining number of samples

Define the model to train the network using minibatch

cost = final_cost(Z_final, Y , parameters, regularization = True)

#use adam optimization to train the network

mini_batches = random_samples_minibatch(X_train, Y_train, mini

#perform gradient descent for each mini-batch

X_batch, Y_batch = mini_batch

_,mini_batch_cost = sess.run([train_net, cost], feed_dict=

train the model using the above defined function

In [16]: parameters = model_with_minibatch(X_data,y_data, layer_dims, learning_rate

You might also like