Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 1: Introduction to Neural Networks & Deep
Learning
1. Q: What is an artificial neuron?
A: It is a computing unit that receives input, applies weights, passes it through a non-
linear function, and produces an output.
2. Q: What are the components of an artificial neural network (ANN)?
A: Neurons, weighted edges, activation functions, and layers (input, hidden, output).
3. Q: What does it mean when we say ANN is “feedforward”?
A: The signal flows from input to output in one direction — no loops.
4. Q: What is backpropagation?
A: A learning algorithm that adjusts weights by propagating the error backward.
Section 2: Deep Learning, AI, ML – Definitions
5. Q: What is Artificial Intelligence (AI)?
A: Training hardware/software systems to emulate human-like tasks such as reasoning,
planning, and acting.
6. Q: What is Machine Learning (ML)?
A: A subset of AI that allows systems to learn from data without explicit programming.
7. Q: What is Deep Learning (DL)?
A: A type of machine learning that uses deep neural networks to learn from large
datasets.
8. Q: How are AI, ML, and DL related?
A: AI ⊃ ML ⊃ DL (DL is inside ML, and ML is inside AI).
1
Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 3: Representation Learning
9. Q: What is representation learning?
A: Learning how to represent raw data (like images or text) into useful features
automatically.
10. Q: What are common architectures for representation learning?
A: DNNs, CNNs, RNNs, GNNs, Deep Belief Networks.
11. Q: Where is representation learning used?
A: In computer vision, speech recognition, NLP, bioinformatics, drug discovery, and
diagnostics.
Section 4: Structured vs Unstructured Data
12. Q: What is the difference between structured and unstructured data?
A: Structured data is organized in rows/columns (like tables), unstructured data includes
text, images, audio, etc.
13. Q: Give examples of unstructured data.
A: Images, speech recordings, and natural language text.
Section 5: Learning Types
14. Q: What is supervised learning?
A: Learning from labeled data (input features and correct output labels).
15. Q: What is unsupervised learning?
A: Learning patterns from data that has no labels — no supervision signals.
16. Q: Name common methods of unsupervised learning.
A: PCA, k-means clustering, autoencoders.
17. Q: What is the goal of unsupervised learning?
A: To find structure or good representations in the data.
18. Q: How are supervised and unsupervised learning related?
A: Some problems can be converted between them using joint or conditional probability
modeling.
2
Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 6: Gradient-Based Learning
19. Q: What is gradient-based learning?
A: A method that updates weights by calculating how the cost changes with each
parameter.
20. Q: Why is gradient descent used in neural networks?
A: Because the cost function is non-linear and non-convex, and gradients help find
minima.
21. Q: What are challenges of using gradient descent in neural networks?
A: No guarantee of convergence, sensitive to initialization, risk of local minima.
Section 7: Cost Functions and Learning
22. Q: What is the purpose of a cost function?
A: To measure the error between predicted and actual values and guide learning.
23. Q: What is cross-entropy loss?
A: A cost function used in classification, measuring the difference between predicted
probability and actual label.
24. Q: When do we use mean squared error (MSE)?
A: When predicting continuous values, often in regression tasks.
25. Q: What is the difference between cross-entropy and MSE?
A: Cross-entropy is better for classification; MSE is for regression but may struggle with
gradient saturation.
Section 8: Output Units
26. Q: What are common output unit types in neural networks?
A: Linear, sigmoid, and softmax units.
27. Q: When is sigmoid used?
A: For binary classification tasks — outputs values between 0 and 1.
28. Q: What is softmax used for?
A: For multi-class classification — outputs a probability distribution over classes.
29. Q: What is a problem with MSE + sigmoid or softmax?
A: It causes gradients to vanish when outputs saturate, slowing learning.
3
Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 9: Hidden Units (Activation Functions)
30. Q: What is ReLU?
A: Rectified Linear Unit, the most common activation function, returns 0 for negatives
and x for positives.
31. Q: What are some ReLU alternatives?
A: Leaky ReLU, ELU, SELU, GELU, PReLU.
32. Q: When is tanh preferred over sigmoid?
A: When a smoother activation is needed, and zero-centered output is beneficial.
33. Q: What is the role of hidden units?
A: To apply non-linear transformations and learn complex patterns in data.
Section 10: Architecture Design
34. Q: What defines a neural network’s architecture?
A: Number of layers, number of units per layer, and how they are connected.
35. Q: What are depth and width in neural networks?
A: Depth = number of layers, width = number of neurons per layer.
36. Q: How do we choose the right architecture?
A: By experimenting and monitoring validation performance.
4
Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 11: Regularization Techniques
37. Q: What is L1 regularization (Lasso)?
A: It adds a penalty proportional to the sum of absolute weights — encourages sparsity.
38. Q: What is L2 regularization (Weight Decay)?
A: It adds a penalty proportional to the sum of squared weights — shrinks weights
smoothly.
39. Q: What does dropout do?
A: Randomly turns off some neurons during training to reduce overfitting.
40. Q: What is early stopping?
A: Stops training when validation error starts to increase — prevents overfitting.
41. Q: What is data augmentation?
A: Applies random changes (like flipping, rotating) to training data to improve
generalization.
42. Q: Name 3 advanced regularization methods.
A: Label smoothing, Mixup/CutMix, adversarial training.
Section 12: Universal Approximation Theorem
43. Q: What does the Universal Approximation Theorem state?
A: A neural network with at least one hidden layer can approximate any function to any
desired accuracy.
44. Q: What’s the limitation of the theorem in practice?
A: The network may exist, but we may not learn it correctly due to optimization or
overfitting.
45. Q: Why are deep networks better than shallow ones?
A: They can express functions more efficiently — using fewer neurons and less
computation.
5
Deep Learning | Lecture 1 | Revision Questions
By: Mohamed Khairi
Section 13: TensorFlow and Tensors
46. Q: What is TensorFlow?
A: An open-source ML platform by Google for building and training neural networks.
47. Q: What are tensors in deep learning?
A: Multi-dimensional arrays used to represent data (like scalars, vectors, matrices).
48. Q: What are basic tensor operations?
A: Element-wise math, broadcasting, reshaping.
Section 14: Keras & PyTorch
49. Q: What is Keras?
A: A high-level deep learning API that runs on top of TensorFlow for fast model
building.
50. Q: What is PyTorch mainly used for?
A: Academic and research-based deep learning — known for flexibility and ease of use.