0% found this document useful (0 votes)

28 views18 pages

Deep Learning With R

The document is a preface and introduction to 'Deep Learning with R' by Abhijit Ghatak, which explores the evolution of artificial intelligence, expert systems, and machine learning leading to deep learning. It emphasizes the significance of deep learning in various fields such as engineering, neuroscience, oncology, and natural language processing, highlighting its applications and advancements. The book aims to cater to a wide audience, from beginners to data scientists, providing a comprehensive understanding of deep learning concepts and R programming.

Uploaded by

jose felipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views18 pages

Deep Learning With R

Uploaded by

jose felipe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Deep Learning with R

Abhijit Ghatak

Deep Learning with R

123
Abhijit Ghatak
Kolkata, India

ISBN 978-981-13-5849-4 ISBN 978-981-13-5850-0 (eBook)

https://doi.org/10.1007/978-981-13-5850-0

Library of Congress Control Number: 2019933713

© Springer Nature Singapore Pte Ltd. 2019

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
I dedicate this book to the deep learning
fraternity at large who are trying their best,
to get systems to reason over longtime
horizons.
Preface

Artiﬁcial Intelligence

The term ‘Artificial Intelligence’ (AI) was coined by John McCarthy in 1956, but
the journey to understand if machines can truly think began much before that.
Vannevar Bush [1] in his seminal work—As We May Think,1—proposed a system
which amplifies people’s own knowledge and understanding.
Alan Turing was a pioneer in bringing AI from the realm of philosophical
prediction to reality. He wrote a paper on the notion of machines being able to
simulate human beings and the ability to do intelligent things. He also realized in
the 1950s that it would need a greater understanding of human intelligence before
we could hope to build machines which would “think” like humans. His paper titled
“Computing Machinery and Intelligence” in 1950 (published in a philosophical
journal called Mind) opened the doors to the field that would be called AI, much
before the term was actually adopted. The paper defined what would be known as
the Turing test,2 which is a model for measuring “intelligence.”
Significant AI breakthroughs have been promised “in the next 10 years,” for the
past 60 years. One of the proponents of AI, Marvin Minsky, claimed in
1967—“Within a generation …, the problem of creating “artificial intelligence” will
substantially be solved,” and in 1970, he quantified his earlier prediction by
stating—“In from three to eight years we will have a machine with the general
intelligence of a human being.”
In the 1960s and early 1970s, several other experts believed it to be right around
the corner. When it did not happen, it resulted in drying up of funds and a decline in
research activities, resulting in what we term as the first AI winter.
During the 1980s, interest in an approach to AI known as expert systems started
gathering momentum and a significant amount of money was being spent on

1
https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/.
2
https://www.turing.org.uk/scrapbook/test.html.

vii
viii Preface

research and development. By the beginning of the 1990s, due to the limited scope
of expert systems, interest waned and this resulted in the second AI winter.
Somehow, it appeared that expectations in AI always outpaced the results.

Evolution of Expert Systems to Machine Learning

An expert system (ES) is a program that is designed to solve problems in a speciﬁc

domain, which can replace a human expert. By mimicking the thinking of human
experts, the expert system was envisaged to analyze and make decisions.
The knowledge base of an ES contains both factual knowledge and heuristic
knowledge. The ES inference engine was supposed to provide a methodology for
reasoning the information present in the knowledge base. Its goal was to come up
with a recommendation, and to do so, it combined the facts of a specific case (input
data), with the knowledge contained in the knowledge base (rules), resulting in a
particular recommendation (answers).
Though ES was suitable to solve some well-defined logical problems, it proved
otherwise in solving other types of complex problems like image classification and
natural language processing (NLP). As a result, ES did not live up to its expecta-
tions and gave rise to a shift from the rule-based approach to a data-driven
approach. This paved the way to a new era in AI—machine learning.
Research over the past 60 years has resulted in significant advances in search
algorithms, machine learning algorithms, and integrating statistical analysis to
understand the world at large.
In machine learning, the system is trained rather than explicitly programmed
(unlike that in ES). By exposing large quantities of known facts (input data and
answers) to a learning mechanism and performing tuning sessions, we get a system
that can make predictions or classifications of unseen input data. It does this by
finding out the statistical structure of the input data (and the answers) and comes up
with rules for automating the task.
Starting in the 1990s, machine learning has quickly become the most popular
subfield of AI. This trend has also been driven by the availability of faster com-
puting and availability of diverse data sets.
A machine learning algorithm transforms its input data into meaningful outputs
by a process known as representations. Representations are transformations of the
input data, to represent it closer to the expected output. “Learning,” in the context of
machine learning, is an automatic search process for better representations of data.
Machine learning algorithms find these representations by searching through a
predefined set of operations.
To summarize, machine learning is searching for useful representations of the
input data within a predefined space, using the loss function (difference between the
actual output and the estimated output) as a feedback to modify the parameters
of the model.
Preface ix

Machine Learning and Deep Learning

It turns out that machine learning focuses on learning only one or two layers of
representations of the input data. This proved intractable for solving human per-
ception problems like image classification, text-to-speech translation, handwriting
transcription, etc. Therefore, it gave way to a new take on learning representations,
which put an emphasis on learning multiple successive layers of representations,
resulting in deep learning. The word deep in deep learning only implies the number
of layers used in a deep learning model.
In deep learning, we deal with layers. A layer is a data transformation function
which carries out the transformation of the data which goes through that layer.
These transformations are parametrized by a set of weights and biases, which
determine the transformation behavior at that layer.
Deep learning is a specific subfield of machine learning, which makes use of
tens/hundreds of successive layers of representations. The specification of what a
layer does to its input is stored in the layer’s parameters. Learning in deep learning
can also be defined as finding a set of values for the parameters of each layer of a
deep learning model, which will result in the appropriate mapping of the inputs to
the associated answers (outputs).
Deep learning has been proven to be better than conventional machine learning
algorithms for these “perceptual” tasks, but not yet proven to be better in other
domains as well.

Applications and Research in Deep Learning

Deep learning has been gaining traction in many fields, and some of them are listed
below. Although most of the work to this date are proof-of-concept (PoC), some
of the results have actually provided a new physical insight.
• Engineering—Signal processing techniques using traditional machine learning
exploit shallow architectures often containing a single layer of nonlinear feature
transformation. Examples of shallow architecture models are conventional
hidden Markov models (HMMs), linear or nonlinear dynamical systems, con-
ditional random fields (CRFs), maximum entropy (MaxEnt) models, support
vector machines (SVMs), kernel regression, multilayer perceptron (MLP) with a
single hidden layer, etc. Signal processing using machine learning also depends
a lot on handcrafted features. Deep learning can help in getting task-specific
feature representations, learning how to deal with noise in the signal and also
work with long-term sequential behaviors. Vision and speech signals require
deep architectures for extracting complex structures, and deep learning can
provide the necessary architecture. Specific signal processing areas where deep
x Preface

learning is being applied are speech/audio, image/video, language processing,

and information retrieval. All this can be improved with better feature extraction
at every layer, more powerful discriminative optimization techniques, and more
advanced architectures for modeling sequential data.
• Neuroscience—Cutting-edge research in human neuroscience using deep
learning is already happening. The cortical activity of “imagination” is being
studied to unveil the computational and system mechanisms that underpin the
phenomena of human imagination. Deep learning is being used to understand
certain neurophysiological phenomena, such as the firing properties of dopa-
mine neurons in the mammalian basal ganglia (a group of subcortical nuclei of
different origin, in the brains of vertebrates including humans, which are
associated with a variety of functions like eye movements, emotion, cognition,
and control of voluntary motor movements). There is a growing community
who are working on the need to distill intelligence into algorithms so that they
incrementally mimic the human brain.
• Oncology—Cancer is the second leading health-related cause of death in the
world. Early detection of cancer increases the probability of survival by nearly
10 times, and deep learning has demonstrated capabilities in achieving higher
diagnostic accuracy with respect to many domain experts. Cancer detection from
gene expression data is challenging due to its high dimensionality and com-
plexity. Researchers have developed DeepGene,3 which is an advanced cancer
classifier based on deep learning. It addresses the obstacles in existing somatic
point mutation-based cancer classification (SMCC) studies, and the results
outperform three widely adopted existing classifiers. Google’s CNN system4 has
demonstrated the ability to identify deadline skin cancers at an accuracy rate on
a par with practitioners. Shanghai University has developed a deep learning
system that can accurately differentiate between benign and malignant breast
tumors on ultrasound shear wave elastography (SWE), yielding more than 93%
accuracy on the elastogram images of more than 200 patients.5
• Physics—Conseil Europeen pour la Recherche Nucleaire (CERN) at Geneva
handles multiple petabytes of data per day during a single run of the Large
Hadron Collider (LHC). LHC collides protons/ions in the collider, and each
collision is recorded. After every collision, the trailing particles—a Higgs boson,
a pair of top quarks, or some mini-black holes—are created, which leave a
trailing signature. Deep learning is being used to classify and interpret these
signatures.
• Astrophysics—Deep learning is being extensively used to classify galaxy
morphologies.6

3
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1334-9.
4
https://www.nature.com/articles/nature21056.
5
https://www.umbjournal.org/article/S0301-5629(17)30002-9/abstract.
6
https://arxiv.org/abs/0908.2033.
Preface xi

• Natural Language Processing—There has been a rising number of research

papers (see Fig. 1) among the research community since 2012, as is reflected in
the paper titled Recent Trends in Deep Learning Based Natural Language
Processing by Young et al.
• Near human-level proﬁciency has been achieved in (a) speech recognition,
(b) image recognition, (c) handwriting transcription, and (d) autonomous driv-
ing. Moreover, super-human-level performance has been achieved by AlphaGo
(built by Google) when it defeated the world's best player Lee Sedol at Go.

Fig. 1 Percentage of deep learning papers submitted for various conferences—Association for
Computational Linguistics (ACL), Conference on Empirical Methods in Natural Language
Processing (EMNLP), European Chapter of the Association for Computational Linguistics
(EACL), North American Chapter of the Association for Computational Linguistics (NAACL),
over the last 6 years since 2018. [2]

Intended Audience

This book has been written to address a wide spectrum of learners:

• For the beginner, this book will be useful to understand the basic concepts of
machine/deep learning and the neural network architecture (Chaps. 1 and 2)
before moving on to the advanced concepts of deep learning.
• For the graduate student, this book will help the reader understand the behavior
of different types of neural networks by understanding the concepts, while
building them up from scratch. It will also introduce the reader to relevant
research papers for further exploration.
xii Preface

• For the data scientist who is familiar with the underlying principles of machine
learning, this book will provide a practical understanding of deep learning.
• For the deep learning enthusiast, this book will explain the deep learning
architecture and what goes on inside a neural network model.
An intermediate level of R programming knowledge is expected from the reader,
and no previous experience of the subject is assumed.

Kolkata, India Abhijit Ghatak

Acknowledgements

Acknowledgment is an unsatisfactory word for my deepest debts.

My father bequeathed to me a love for adventure and an interest in history,
literature, and mathematics.
My professors at the Faculty of Mechanical Engineering, Jadavpur University,
instilled an appetite for analysis and quantitative techniques in engineering; my
mentor and advisor at University of Pune, Prof. SY Bhave, motivated me to
interpret the algorithm and write a program using the C language on predicting
torsional vibration failures of a marine propulsion shaft using state vectors; and my
advisors at Stevens Institute of Technology helped me to transit from a career
submarine engineer in the Indian Navy to a data scientist.
My wife Sushmita lived through the slow gestation of this book. She listened
and engaged with me all the way. She saw potential in this work long before I did
and encouraged me to keep going.
I owe my thanks to Sunanda for painstakingly proofreading the manuscript.
I also have two old debts—Robert Louis Stevenson and Arthur Conan
Doyle. In Treasure Island, Mr Smollet is most eager to discover the treasure and
he says—“We must go on,” and in Case of Identity, Sherlock Holmes states—“It
has long been an axiom of mine that the little things are inﬁnitely the most
important.” Both are profound statements in the realm of a new science, and the
litterateurs had inked their thoughts claiming no distinction, when there is not a
distinction between the nature of the pursuit.
I owe all of them my deepest debts.

Abhijit Ghatak

xiii
About This Book

• Deep learning is a growing area of interest to academia and industry alike. The
applications of deep learning range from medical diagnostics, robotics, security
and surveillance, computer vision, natural language processing, autonomous
driving, etc. This has been largely possible due to a conflation of research
activities around the subject and the emergence of APIs like Keras.
• This book is a sequel to Machine Learning with R, written by the same author,
and explains deep learning from ﬁrst principles—how to construct different
neural network architectures and understand the hyperparameters of the neural
network and the need for various optimization algorithms. The theory and the
math are explained in detail before discussing the code in R. The different
functions are ﬁnally merged to create a customized deep learning application. It
also introduces the reader to the Keras and TensorFlow libraries in R and explains
the advantage of using these libraries to get a basic model up and running.
• This book builds on the understanding of deep learning to create R-based appli-
cations on computer vision, natural language processing, and transfer learning.
This book has been written to address a wide spectrum of learners:
• For the beginner, this book will be useful to understand the basic concepts of
machine/deep learning and the neural network architecture (Chaps. 1 and 2)
before moving on to the advanced concepts of deep learning.
• For the graduate student, this book will help the reader to understand the
behavior of different types of neural networks by understanding the concepts,
while building them up from scratch. It will also introduce the reader to relevant
research papers for further exploration.
• For the data scientist who is familiar with the underlying principles of machine
learning, this book will provide a practical understanding of deep learning.
• For the deep learning enthusiast, this book will explain the deep learning
architecture and what goes on inside a neural network model.
This book requires an intermediate level of skill in R and no previous experience
of deep learning.

xv
Contents

1 Introduction to Machine Learning . . . . . . . . . . . . . . . . . . . . . . . ... 1

1.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 1
1.1.1 Difference Between Machine Learning and Statistics ... 2
1.1.2 Difference Between Machine Learning and Deep
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Bias–Variance Trade-off in Machine Learning . . . . . . . . . . . . . . 4
1.4 Addressing Bias and Variance in the Model . . . . . . . . . . . . . . . . 5
1.5 Underﬁtting and Overﬁtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.7 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 Hyperparameter Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.9.1 Searching for Hyperparameters . . . . . . . . . . . . . . . . . . . 11
1.10 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . 12
1.11 Quantifying Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.11.1 The Cross-Entropy Loss . . . . . . . . . . . . . . . . . . . . . . . . 14
1.11.2 Negative Log-Likelihood . . . . . . . . . . . . . . . . . . . . . . . 15
1.11.3 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.11.4 Cross-Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.11.5 Kullback–Leibler Divergence . . . . . . . . . . . . . . . . . . . . 19
1.11.6 Summarizing the Measurement of Loss . . . . . . . . . . . . . 20
1.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Types of Neural Network Architectures . . . . . . . . . . . . . . . . . . . 25
2.2.1 Feedforward Neural Networks (FFNNs) . . . . . . . . . . . . 25
2.2.2 Convolutional Neural Networks (ConvNets) . . . . . . . . . 25
2.2.3 Recurrent Neural Networks (RNNs) . . . . . . . . . . . . . . . 25

xvii
xviii Contents

2.3 Forward Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Input Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Bias Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Weight Matrix of Layer-1 . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.5 Activation Function at Layer-1 . . . . . . . . . . . . . . . . . . . 30
2.3.6 Weights Matrix of Layer-2 . . . . . . . . . . . . . . . . . . . . . . 30
2.3.7 Activation Function at Layer-2 . . . . . . . . . . . . . . . . . . . 32
2.3.8 Output Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.9 Summary of Forward Propagation . . . . . . . . . . . . . . . . . 34
2.4 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.1 Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Hyperbolic Tangent . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.3 Rectified Linear Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.4 Leaky Rectified Linear Unit . . . . . . . . . . . . . . . . . . . . . 38
2.4.5 Softmax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Derivatives of Activation Functions . . . . . . . . . . . . . . . . . . . . . . 42
2.5.1 Derivative of Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5.2 Derivative of tanh . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.3 Derivative of Rectified Linear Unit . . . . . . . . . . . . . . . . 44
2.5.4 Derivative of Leaky Rectified Linear Unit . . . . . . . . . . . 44
2.5.5 Derivative of Softmax . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Cross-Entropy Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.7 Derivative of the Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7.1 Derivative of Cross-Entropy Loss with Sigmoid . . . . . . . 49
2.7.2 Derivative of Cross-Entropy Loss with Softmax . . . . . . . 49
2.8 Back Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8.1 Summary of Backward Propagation . . . . . . . . . . . . . . . . 53
2.9 Writing a Simple Neural Network Application . . . . . . . . . . . . . . 54
2.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3 Deep Neural Networks-I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 Writing a Deep Neural Network (DNN) Algorithm . . . . . . . . . . . 65
3.2 Overview of Packages for Deep Learning in R . . . . . . . . . . . . . . 80
3.3 Introduction to keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.1 Installing keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.2 Pipe Operator in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3.3 Defining a keras Model . . . . . . . . . . . . . . . . . . . . . . . 81
3.3.4 Configuring the keras Model . . . . . . . . . . . . . . . . . . . 81
3.3.5 Compile and Fit the Model . . . . . . . . . . . . . . . . . . . . . . 82
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Contents xix

4 Initialization of Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . 87

4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.1.1 Breaking Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.2 Zero Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.1.3 Random Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.1.4 Xavier Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.1.5 He Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2 Dealing with NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.1 Hyperparameters and Weight Initialization . . . . . . . . . . . 100
4.2.2 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2.3 Using Different Activation Functions . . . . . . . . . . . . . . . 101
4.2.4 Use of NanGuardMode, DebugMode,
or MonitorMode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2.5 Numerical Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2.6 Algorithm Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2.7 NaN Introduced by AllocEmpty . . . . . . . . . . . . . . . . . . 101
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2.1 Gradient Descent or Batch Gradient Descent . . . . . . . . . 104
5.2.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . 105
5.2.3 Mini-Batch Gradient Descent . . . . . . . . . . . . . . . . . . . . 105
5.3 Parameter Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.1 Simple Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.2 Momentum Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3.3 Nesterov Momentum Update . . . . . . . . . . . . . . . . . . . . 109
5.3.4 Annealing the Learning Rate . . . . . . . . . . . . . . . . . . . . . 110
5.3.5 Second-Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.6 Per-Parameter Adaptive Learning Rate Methods . . . . . . . 112
5.4 Vanishing Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.1 Dropout Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.5.2 ‘2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.5.3 Combining Dropout and ‘2 Regularization? . . . . . . . . . . 144
5.6 Gradient Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6 Deep Neural Networks-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.1 Revisiting DNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.2 Modeling Using keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.2.1 Adjust Epochs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.2.2 Add Batch Normalization . . . . . . . . . . . . . . . . . . . . . . . 159
xx Contents

6.2.3 Add Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

6.2.4 Add Weight Regularization . . . . . . . . . . . . . . . . . . . . . . 161
6.2.5 Adjust Learning Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.2.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.3 Introduction to TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.3.1 What is Tensor ‘Flow’? . . . . . . . . . . . . . . . . . . . . . . . 165
6.3.2 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
6.3.3 Installing and Running TensorFlow . . . . . . . . . . . . . . 166
6.4 Modeling Using TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.4.1 Importing MNIST Data Set from TensorFlow . . . . . . 167
6.4.2 Deﬁne Placeholders . . . . . . . . . . . . . . . . . . . . . . . 168
6.4.3 Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.4.4 Instantiating a Session and Running the Model . . . . . 169
6.4.5 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
7 Convolutional Neural Networks (ConvNets) . . . . . . . . . . . . . . . . . . . 171
7.1 Building Blocks of a Convolution Operation . . . . . . . . . . . . . . . 171
7.1.1 What is a Convolution Operation? . . . . . . . . . . . . . . . . 171
7.1.2 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.1.3 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.1.4 Strided Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . 176
7.1.5 Convolutions over Volume . . . . . . . . . . . . . . . . . . . . . . 177
7.1.6 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7.2 Single-Layer Convolutional Network . . . . . . . . . . . . . . . . . . . . . 180
7.2.1 Writing a ConvNet Application . . . . . . . . . . . . . . . . . . . 181
7.3 Training a ConvNet on a Small DataSet Using keras . . . . . . . . 186
7.3.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
7.4 Specialized Neural Network Architectures . . . . . . . . . . . . . . . . . 193
7.4.1 LeNet-5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
7.4.2 AlexNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.3 VGG-16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.4.4 GoogleNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
7.4.5 Transfer Learning or Using Pretrained Models . . . . . . . . 196
7.4.6 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.5 What is the ConvNet Learning? A Visualization of Different
Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7.6 Introduction to Neural Style Transfer . . . . . . . . . . . . . . . . . . . . . 203
7.6.1 Content Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.6.2 Style Loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.6.3 Generating Art Using Neural Style Transfer . . . . . . . . . . 204
7.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Contents xxi

8 Recurrent Neural Networks (RNN) or Sequence Models . . . . . . . . . 207

8.1 Sequence Models or RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
8.2 Applications of Sequence Models . . . . . . . . . . . . . . . . . . . . . . . 209
8.3 Sequence Model Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 209
8.4 Writing the Basic Sequence Model Architecture . . . . . . . . . . . . . 210
8.4.1 Backpropagation in Basic RNN . . . . . . . . . . . . . . . . . . 212
8.5 Long Short-Term Memory (LSTM) Models . . . . . . . . . . . . . . . . 215
8.5.1 The Problem with Sequence Models . . . . . . . . . . . . . . . 215
8.5.2 Walking Through LSTM . . . . . . . . . . . . . . . . . . . . . . . 216
8.6 Writing the LSTM Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 217
8.7 Text Generation with LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
8.7.1 Working with Text Data . . . . . . . . . . . . . . . . . . . . . . . . 225
8.7.2 Generating Sequence Data . . . . . . . . . . . . . . . . . . . . . . 226
8.7.3 Sampling Strategy and the Importance of Softmax
Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
8.7.4 Implementing LSTM Text Generation
(Character-Level Neural Language Model) . . . . . . . . . . . 227
8.8 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.8.1 Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
8.8.2 Transfer Learning and Word Embedding . . . . . . . . . . . . 231
8.8.3 Analyzing Word Similarity Using Word Vectors . . . . . . 232
8.8.4 Analyzing Word Analogies Using Word Vectors . . . . . . 233
8.8.5 Debiasing Word Vectors . . . . . . . . . . . . . . . . . . . . . . . . 234
8.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
9 Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
9.1 Gathering Experience and Knowledge . . . . . . . . . . . . . . . . . . . . 239
9.1.1 Research Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.2 Towards Lifelong Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
9.2.1 Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
About the Author

Abhijit Ghatak is a Data Engineer and holds graduate degrees in Engineering and
Data Science from India and USA. He started his career as a submarine engineer
ofﬁcer in the Indian Navy where he worked on multiple data-intensive projects
involving submarine operations and submarine construction. He has thereafter
worked in academia, IT consulting and as research scientist in the area of Internet of
Things (IoT) and pattern recognition for the European Union. He has authored
many publications in the areas of engineering, IoT and machine learning. He
presently advises start-up companies on deep learning, pattern recognition and data
analytics. His areas of research include IoT, stream analytics and design of deep
learning systems. He can be reached at [email protected].

xxiii

Deep Learning With Python A Crash Course To Deep Learning With Illustrations in Python Programming Language
100% (2)
Deep Learning With Python A Crash Course To Deep Learning With Illustrations in Python Programming Language
59 pages
Abhijit Ghatak - Deep Learning With R-Springer (2019)
No ratings yet
Abhijit Ghatak - Deep Learning With R-Springer (2019)
259 pages
Theobald O. Machine Learning With Python 2024
No ratings yet
Theobald O. Machine Learning With Python 2024
146 pages
ML Notes
No ratings yet
ML Notes
108 pages
DL Unit - I CSD Iv
No ratings yet
DL Unit - I CSD Iv
19 pages
Seminar On Machine Learning and Ai
No ratings yet
Seminar On Machine Learning and Ai
15 pages
Deep Learning
No ratings yet
Deep Learning
285 pages
ML&DAP Module 1
No ratings yet
ML&DAP Module 1
41 pages
Deep Learning With Keras and Tensorflow
No ratings yet
Deep Learning With Keras and Tensorflow
88 pages
Eye PPT For Nursing Students by DR - Reshma Ajay
100% (11)
Eye PPT For Nursing Students by DR - Reshma Ajay
46 pages
Deep Learning
No ratings yet
Deep Learning
100 pages
Time Series in AI
No ratings yet
Time Series in AI
21 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
1.1 Introduction M1
No ratings yet
1.1 Introduction M1
35 pages
Lesson 02 Introduction To Deep Learning
No ratings yet
Lesson 02 Introduction To Deep Learning
74 pages
Deep Learning-Lecture 1 (Student)
No ratings yet
Deep Learning-Lecture 1 (Student)
9 pages
1.1.1. Introduction To AI and Machine Learning
No ratings yet
1.1.1. Introduction To AI and Machine Learning
34 pages
FML CSOE-007 FML B Tech 6th Sem OE Till 9th Feb 2024
No ratings yet
FML CSOE-007 FML B Tech 6th Sem OE Till 9th Feb 2024
134 pages
MLT Unit 1 Notes
No ratings yet
MLT Unit 1 Notes
29 pages
SSRN 3323796
No ratings yet
SSRN 3323796
22 pages
ML Material
No ratings yet
ML Material
40 pages
Module1 - Deep Learning
No ratings yet
Module1 - Deep Learning
26 pages
ML-1st Unit
No ratings yet
ML-1st Unit
23 pages
Class Notes Pyhon 1
No ratings yet
Class Notes Pyhon 1
6 pages
Unit I - Fundamentals of DL
No ratings yet
Unit I - Fundamentals of DL
41 pages
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
No ratings yet
Nitin Raj Sharma - AIApplicationsInTheDomainsOfMachineLearningAndDeepLearning - NitinRajSh
9 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
37 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
Deep Learning's Impact on Chip Design
No ratings yet
Deep Learning's Impact on Chip Design
17 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
Hao 2016
No ratings yet
Hao 2016
23 pages
3 A Review On Machine Learning and It's Application
No ratings yet
3 A Review On Machine Learning and It's Application
5 pages
Machine Learning and Deep Learning Revol
No ratings yet
Machine Learning and Deep Learning Revol
4 pages
Deep Learmnng
No ratings yet
Deep Learmnng
2 pages
Simple Belt Conveyor Calculation Example
90% (10)
Simple Belt Conveyor Calculation Example
3 pages
Deep Learning: A Critical Appraisal: Gary Marcus New York University
No ratings yet
Deep Learning: A Critical Appraisal: Gary Marcus New York University
27 pages
12th DIFFERENTIATION: - Theory & Problems
No ratings yet
12th DIFFERENTIATION: - Theory & Problems
11 pages
FAM Unit4
No ratings yet
FAM Unit4
11 pages
AI and ML
No ratings yet
AI and ML
28 pages
Advancements and Applications of Deep Learning
No ratings yet
Advancements and Applications of Deep Learning
4 pages
Deeplearning Avisualapproach Preview
No ratings yet
Deeplearning Avisualapproach Preview
5 pages
MLF 1
No ratings yet
MLF 1
15 pages
01 Introduction and 02 ML
No ratings yet
01 Introduction and 02 ML
8 pages
Electrical System Building Blocks
100% (1)
Electrical System Building Blocks
71 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
16 pages
Introduction Toartificial Intelligence
100% (1)
Introduction Toartificial Intelligence
6 pages
Insidedeeplearning Preview
No ratings yet
Insidedeeplearning Preview
5 pages
A Review of Machine Learning and Deep Learning Applications
No ratings yet
A Review of Machine Learning and Deep Learning Applications
6 pages
Unit - 1 Deep Learning Techniques
No ratings yet
Unit - 1 Deep Learning Techniques
18 pages
2 A Review On Applicatons of Ai in Machine Learning and Deep Learning
No ratings yet
2 A Review On Applicatons of Ai in Machine Learning and Deep Learning
6 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Deep Learning A Review
No ratings yet
Deep Learning A Review
11 pages
ML 1234
No ratings yet
ML 1234
10 pages
AI vs ML vs Deep Learning Explained
No ratings yet
AI vs ML vs Deep Learning Explained
6 pages
Deep L Earning
No ratings yet
Deep L Earning
7 pages
Light Emitting Polymer Report
No ratings yet
Light Emitting Polymer Report
24 pages
Artificial Intelligence and Deep Learning
0% (1)
Artificial Intelligence and Deep Learning
9 pages
ML New New 1
No ratings yet
ML New New 1
15 pages
Notes Unit 1 ML
No ratings yet
Notes Unit 1 ML
17 pages
The Geometry of Futon Comfort
No ratings yet
The Geometry of Futon Comfort
5 pages
ITR Roll No.20
No ratings yet
ITR Roll No.20
3 pages
Computer Graphics
100% (1)
Computer Graphics
132 pages
Perkins
No ratings yet
Perkins
5 pages
A Review On Deep Learning Applications
No ratings yet
A Review On Deep Learning Applications
11 pages
Newborn Disorders - : Small For Gestational Age (Sga) Newborn
100% (1)
Newborn Disorders - : Small For Gestational Age (Sga) Newborn
11 pages
AI & ML Basics for Tech Enthusiasts
No ratings yet
AI & ML Basics for Tech Enthusiasts
4 pages
Book Summary
No ratings yet
Book Summary
41 pages
Percentage Boq: Validate Print Help
No ratings yet
Percentage Boq: Validate Print Help
5 pages
Annex+A-2 Draft+Amended+Net+Metering+Agreement
No ratings yet
Annex+A-2 Draft+Amended+Net+Metering+Agreement
5 pages
Kibabii University: Conference Programme
No ratings yet
Kibabii University: Conference Programme
32 pages
(IJCST-V9I4P17) :yew Kee Wong
No ratings yet
(IJCST-V9I4P17) :yew Kee Wong
4 pages
Concrete Mix Guide for Builders
No ratings yet
Concrete Mix Guide for Builders
14 pages
Thomas Mutoro Wefwafwa - Final Project Report-Signed
No ratings yet
Thomas Mutoro Wefwafwa - Final Project Report-Signed
34 pages
Rotman Good Kitchen
No ratings yet
Rotman Good Kitchen
6 pages
Frankenstein Context
No ratings yet
Frankenstein Context
1 page
Policy Wordings
No ratings yet
Policy Wordings
19 pages
Z-Transforms and Their Applications For Solving Difference Equations
No ratings yet
Z-Transforms and Their Applications For Solving Difference Equations
3 pages
FA4 10th Science (2024-25)
No ratings yet
FA4 10th Science (2024-25)
3 pages
STP32537S Characterization of High Purity Cathodes For Plant Control
No ratings yet
STP32537S Characterization of High Purity Cathodes For Plant Control
30 pages
Lecture 3 (Week 2) : BIOLOGY 201/winter 2018 Dr. Ian Ferguson
No ratings yet
Lecture 3 (Week 2) : BIOLOGY 201/winter 2018 Dr. Ian Ferguson
5 pages
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
No ratings yet
10.2305 IUCN - UK.1998.RLTS.T33255A9771604.en
5 pages
Adding & Subtracting Integers Lesson
No ratings yet
Adding & Subtracting Integers Lesson
5 pages
Absurdism in "The Outsider": Ruoqi Han
No ratings yet
Absurdism in "The Outsider": Ruoqi Han
5 pages
العين الشريرة وعلم الموت ومقالات أخرى 2
No ratings yet
العين الشريرة وعلم الموت ومقالات أخرى 2
392 pages
WK2 Cloud Computing Presentation PDF
No ratings yet
WK2 Cloud Computing Presentation PDF
16 pages
Lampiran SPTJM PSP
No ratings yet
Lampiran SPTJM PSP
7 pages
Catalog Fortuner GR Sport Compressed
No ratings yet
Catalog Fortuner GR Sport Compressed
8 pages
LCD Repair Guide for Technicians
No ratings yet
LCD Repair Guide for Technicians
14 pages