1. What is the difference between Machine Learning and Deep Learning?
Machine Learning forms a subset of Artificial Intelligence, where we use statistics
and algorithms to train machines with data, thereby, helping them improve with
experience.
Deep Learning is a part of Machine Learning, which involves mimicking the human
brain in terms of structures called neurons, thereby, forming neural networks.
2. What is a perceptron?
A perceptron is like the actual neuron in the human brain. It receives inputs from
various entities and applies functions to these inputs, which transform them to be the
output.
A perceptron is mainly used to perform binary classification where it sees an input,
computes functions based on the weights of the input, and outputs the required
transformation.
3. How is Deep Learning better than Machine Learning?
Machine Learning is powerful in a way that it is sufficient to solve most of the
problems. However, Deep Learning gets an upper hand when it comes to working
with data that has a large number of dimensions. With data that is large in size, a
Deep Learning model can easily work with it as it is built to handle this.
4. What are some of the most used applications of Deep Learning?
Deep Learning is used in a variety of fields today. The most used ones are as follows:
Sentiment Analysis
Computer Vision
Automatic Text Generation
Object Detection
Natural Language Processing
Image Recognition
5. What is the meaning of overfitting?
Overfitting is a very common issue when working with Deep Learning. It is a
scenario where the Deep Learning algorithm vigorously hunts through the data to
obtain some valid information.
This makes the Deep Learning model pick up noise rather than useful data, causing
very high variance and low bias. This makes the model less accurate, and this is an
undesirable effect that can be prevented.
· Low bias (model fits training data very well)
· High variance (model’s predictions vary wildly on new data)
6. What are activation functions?
Activation functions are entities in Deep Learning that are used to translate inputs into
a usable output parameter. It is a function that decides if a neuron needs activation or
not by calculating the weighted sum on it with the bias.
Using an activation function makes the model output to be non-linear. There are many
types of activation functions:
ReLU
Softmax
Sigmoid
Linear
Tanh
7. Why is the Fourier transform used in Deep Learning?
Fourier transform is an effective package used for analyzing and managing large
amounts of data present in a database. It can take in real-time array data and process it
quickly. This ensures that high efficiency is maintained and also makes the model
more open to processing a variety of signals.
8. What are the steps involved in training a perception in Deep Learning?
There are five main steps that determine the learning of a perceptron:
1. Initialize thresholds and weights
2. Provide inputs
3. Calculate outputs
4. Update weights in each step
5. Repeat steps 2 to 4
9. What is the use of the loss function?
The loss function is used as a measure of accuracy to see if a neural network has
learned accurately from the training data or not. This is done by comparing the
training dataset to the testing dataset.
The loss function is a primary measure of the performance of the neural network. In
Deep Learning, a good performing network will have a low loss function at all times
when training.
10. What are some of the Deep Learning frameworks or tools that you have
used?
This question is quite common in a Deep Learning interview. Make sure to answer
based on the experience you have with the tools.
However, some of the top Deep Learning frameworks out there today are:
TensorFlow
Keras
PyTorch
Caffe2
CNTK
MXNet
Theano
11. What is the use of the swish function?
The swish function is a self-gated activation function developed by Google. It is now
a popular activation function used by many as Google claims that it outperforms all of
the other activation functions in terms of computational efficiency.
13. What are the steps to be followed to use the gradient descent algorithm?
There are five main steps that are used to initialize and use the gradient descent
algorithm:
Initialize biases and weights for the network
Send input data through the network (the input layer)
Calculate the difference (the error) between expected and predicted values
Change values in neurons to minimize the loss function
Multiple iterations to determine the best weights for efficient working
14. Differentiate between a single-layer perceptron and a multi-layer perceptron.
Here is the differnece between single-layer perceptron and a multi-layer perceptron :
Single-layer Perceptron Multi-layer Perceptron
Cannot classify non-linear data points Can classify non-linear data
Takes in a limited amount of parameters Withstands a lot of parameters
Less efficient with large data Highly efficient with large datasets
15. What is data normalization in Deep Learning?
Data normalization is a preprocessing step that is used to refit the data into a specific
range. This ensures that the network can learn effectively as it has better convergence
when performing backpropagation.
16. What is forward propagation?
Forward propagation is the scenario where inputs are passed to the hidden layer with
weights. In every single hidden layer, the output of the activation function is
calculated until the next layer can be processed. It is called forward propagation as the
process begins from the input layer and moves toward the final output layer.
17. What is backpropagation?
Backpropagation is used to minimize the cost function by first seeing how the value
changes when weights and biases are tweaked in the neural network. This change is
easily calculated by understanding the gradient at every hidden layer. It is called
backpropagation as the process begins from the output layer, moving backward to the
input layers.
18. What are hyperparameters in Deep Learning?
Hyperparameters are variables used to determine the structure of a neural network.
They are also used to understand parameters, such as the learning rate and the number
of hidden layers, and more, present in the neural network.
19. How can hyperparameters be trained in neural networks?
Hyperparameters can be trained using four components as shown below:
Batch size: This is used to denote the size of the input chunk. Batch sizes can be
varied and cut into sub-batches based on the requirement.
Epochs: An epoch denotes the number of times the training data is visible to the
neural network so that it can train. Since the process is iterative, the number of
epochs will vary based on the data.
Momentum: Momentum is used to understand the next consecutive steps that occur
with the current data being executed at hand. It is used to avoid oscillations when
training.
Learning rate: Learning rate is used as a parameter to denote the time required for
the network to update the parameters and learn.
20. What is Deep Learning?
Deep learning is a subset of machine learning that is completely based on Artificial
Intelligence. It used to teach computers to process data in a way that was inspired by
the human brain; it recognized the complex patterns in pictures, text, sounds, and so
on.
21. What are Neural Networks?
Neural Network is also known as an Artificial Neural Network. It is a subset of
machine learning that consists of interconnected nodes or neurons that process and
learn from the data.
22. What are the advantages and disadvantages of neural networks?
Advantages of Neural Networks:
Neural networks can learn complex models and non-linear relationships.
It stores all the information on the entire network with the help of nodes.
Neural networks also can work with unorganized data.
Neural networks can perform more than one function at a time.
If one or more than one cell is corrupted, even though the output doesn’t have an
impact.
Disadvantages of Neural Networks:
Due to their quick adaptation to the changing requirements, neural networks require
heavy machinery and hardware to work.
Neural networks depend on a lot of training data, which leads to the problem of
overfitting.
Neural networks require lots of computational power because they act like a human
brain and are composed of many interconnected nodes, and each node computes
based on weights.
Neural networks are much more complex and hard to explain than other models.
Neural network models need careful attention in data preparation because it’s a
crucial step in machine learning and harms the input data.
23. What is the Learning Rate in the context of Neural Network Models?
The learning rate is a hyperparameter that controls the size of the updates that were
created by the weights during data training. It also determines the size of each step in
each training iteration. The default value of the learning rate is 0.1 or 0.01, and it’s
represented by the character ‘a’.
24. What is a Deep Neural Network?
A deep neural network is a machine learning algorithm that mimics the brain’s
information processing. It’s made up of multiple layers of nodes known as neurons.
DNN is used in complex mathematical modeling.
25. What are the different types of Deep Neural Networks?
There are 4 types of deep neural networks:
1. Feed Forward Neural Network: The Feed Forward Neural Network is the basic
neural network, whose flow control starts from the input layer and moves forward
to the output layer. The data will flow only in a single direction; there is no
backpropagation mechanism.
2. Recurrent Neural Network: A recurrent neural network is another type of deep
neural network in which the data will flow in a single direction. In this neural
network, each neuron is present in the hidden layer, and they receive the input with
a specific delay in time.
3. Convolutional Neural Network: A convolutional neural network is a special kind of
neural network that we can use for image classification, clustering of images, and so
on.
4. Restricted Boltzmann Machine: Restricted Boltzmann Machine is another type of
Boltzmann Machine where the neurons present in the input layer and the hidden
layer are surrounded by symmetric connections. This machine algorithm can be
used in filtering, feature learning, and risk detection.
26. Explain Data Normalization. What is the need for it?
Data normalization helps us normalize the neural network nodes into different
branches. It works by subtracting the mean and dividing it by the standard deviation.
Data normalization helps to make the data stable because whatever the features are in
the dataset, they are not on the same scale, which makes the data difficult to learn.
27. What is the meaning of dropout in Deep Learning?
Dropout is a technique that is used to avoid overfitting a model in Deep Learning. If
the dropout value is too low, then it will have minimal effect on learning. If it is too
high, then the model can under-learn, thereby, causing lower efficiency.
28. What are tensors?
Tensors are multidimensional arrays in Deep Learning that are used to represent data.
They represent the data with higher dimensions. Due to the high-level nature of the
programming languages, the syntax of tensors is easily understood and broadly used.
29. What is the meaning of model capacity in Deep Learning?
In Deep Learning, model capacity refers to the capacity of the model to take in a
variety of mapping functions. Higher model capacity means a large amount of
information can be stored in the network.
We will check out neural network interview questions alongside as it is also a vital
part of Deep Learning.
30. What is a Boltzmann machine?
A Boltzmann machine is a type of recurrent neural network that uses binary decisions,
alongside biases, to function. These neural networks can be hooked up together to
create deep belief networks, which are very sophisticated and used to solve the most
complex problems out there.
31. What are some of the advantages of using TensorFlow?
TensorFlow has numerous advantages, and some of them are as follows:
High amount of flexibility and platform independence
Trains using CPU and GPU
Supports auto differentiation and its features
Handles threads and asynchronous computation easily
Open-source
Has a large community
32. What is a computational graph in Deep Learning?
A computation graph is a series of operations that are performed to take inputs and
arrange them as nodes in a graph structure. It can be considered as a way of
implementing mathematical calculations into a graph. This helps in parallel
processing and provides high performance in terms of computational capability.
33. What is a CNN?
CNNs are convolutional neural networks that are used to perform analysis on image
annotation and visuals. These classes of neural networks can input a multi-channel
image and work on it easily.
34. What are the various layers present in a CNN?
There are four main layers that form a convolutional neural network:
Convolution: These are layers consisting of entities called filters that are used as
parameters to train the network.
ReLu: It is used as the activation function and is always used with the convolution
layer.
Pooling: Pooling is the concept of shrinking the complex data entities that form
after convolution and is primarily used to maintain the size of an image after
shrinkage.
Connectedness: This is used to ensure that all of the layers in the neural network are
fully connected and activation can be computed using the bias easily.
35. What is an RNN in Deep Learning?
RNNs stand for recurrent neural networks, which form to be a popular type of
artificial neural network. They are used to process sequences of data, text, genomes,
handwriting, and more. RNNs make use of backpropagation for the training
requirements.
36. What is a vanishing gradient when using RNNs?
Vanishing gradient is a scenario that occurs when we use RNNs. Since RNNs make
use of backpropagation, gradients at every step of the way will tend to get smaller as
the network traverses through backward iterations. This equates to the model learning
very slowly, thereby, causing efficiency problems in the network.
37. What is exploding gradient descent in Deep Learning?
Exploding gradients are an issue causing a scenario that clumps up the gradients. This
creates a large number of updates of the weights in the model when training.
The working of gradient descent is based on the condition that the updates are small
and controlled. Controlling the updates will directly affect the efficiency of the model.
38. What is the use of LSTM?
LSTM stands for long short-term memory. It is a type of RNN that is used to
sequence a string of data. It consists of feedback chains that give it the ability to
perform like a general-purpose computational entity.
43. What is Forward and Back Propagation in Deep Learning?
Forward Propagation is the way data moves from left to right in the neural network,
ie. from the input layer to the output layer.
Back Propagation is the way data moves from right to left, i.e., from the output layer
to the input layer. Both ways help the data train properly; once the corrected weight is
learned, it will be able to converge and generalize the data better.
44. What would happen if we set all the biases and weights to zero to train a
neural network?
Yes, if all the biases are set to zero, then the neural network model has a chance of
learning.
No, if the training model is set to zero, because the neural network will never learn the
complete task. If the weights are set to zero then the derivatives for each weight
remain constant, which leads the neurons to learn the same features in each iteration
and generate poor results.
45. Explain the difference between a Shallow Network and a Deep Network.
Shallow Network: The shallow network has only one hidden layer; it will fit in any
function, and it also requires a large number of input parameters. Shallow neural
networks tell us exactly what is going on inside the deep neural network.
Deep Network: The deep network has numerous hidden layers, and it will also fit in
any function. Deep neural networks are mostly used for data-driven modeling.
46. For the application of Face Detection, which deep learning algorithm would
you use?
The best algorithm for face detection is Convolutional Neural Networks because CNN
gives us better accuracy in object detection tasks, and it is a two-stage architecture
with a region proposal network that improves localization.
47. What is an Activation Function?
The activation function in artificial neural networks helps the network learn the
complex patterns in the data. The activation function is responsible for what data is to
be fired to the next neurons at the end of the process.
48. What do you mean by an Epoch in the context of deep learning?
In deep learning, an epoch is a term that refers to the number of passes the machine
has made across the fully trained dataset. The number of epochs is equal to the
number of iterations if the batch size is the entire training dataset.
49. What are some of the limitations of Deep Learning?
There are a few disadvantages of Deep Learning as mentioned below:
Networks in Deep Learning require a huge amount of data to train well.
Deep Learning concepts can be complex to implement sometimes.
Achieving a high amount of model efficiency is difficult in many cases.
50. What are the variants of gradient descent?
There are three variants of gradient descent as shown below:
Stochastic gradient descent: A single training example is used for the calculation of
gradient and for updating parameters.
Batch gradient descent: Gradient is calculated for the entire dataset, and parameters
are updated at every iteration.
Mini-batch gradient descent: Samples are broken down into smaller-sized batches
and then worked on as in the case of stochastic gradient descent.
51. Why is mini-batch gradient descent so popular?
Mini-batch gradient descent is popular as:
It is more efficient when compared to stochastic gradient descent.
Generalization is done by finding the flat minima.
It helps avoid the local minima by allowing the approximation of the gradient for
the entire dataset.
53. Why is the Leaky ReLU function used in Deep Learning?
Leaky ReLU, also called LReL, is used to manage a function to allow the passing of
small-sized negative values if the input value to the network is less than zero.
54. What are some of the examples of supervised learning algorithms in Deep
Learning?
There are three main supervised learning algorithms in Deep Learning:
Artificial neural networks
Convolutional neural networks
Recurrent neural networks
57. What is the meaning of valid padding and same padding in CNN?
Valid padding: It is used when there is no requirement for padding. The output
matrix will have the dimensions (n – f + 1) X (n – f + 1) after convolution.
Same padding: Here, padding elements are added all around the output matrix. It
will have the same dimensions as the input matrix.