Deep Learning | Lecture 5 | Revision Questions
By: Mohamed Khairi
Section 1: Multiclass, Multilabel, and Multi-output
Classification
1. Q: What is multiclass classification?
A: It's the task of classifying instances into one of three or more classes (e.g., cat, dog, or
horse).
2. Q: What is the One-vs-Rest (OvR) strategy?
A: A classifier is trained for each class, treating that class as positive and all others as
negative.
3. Q: What is the One-vs-One (OvO) strategy?
A: A binary classifier is trained for each pair of classes, with a voting system used during
prediction.
4. Q: How is multiclass classification handled in deep learning?
A: Using N output neurons and a softmax activation in the final layer.
5. Q: What is multi-label classification?
A: Each instance can have multiple labels (e.g., a song could be both pop and rock).
6. Q: What is multi-output classification?
A: A model predicts multiple outputs simultaneously, where each output has its own
label (e.g., temperature + pressure).
7. Q: How do multi-label and multi-output classification differ?
A: Multi-label outputs share semantics (e.g., multiple genres), while multi-output labels
are separate (e.g., temp & pressure).
8. Q: Give a real-world example of multi-output classification.
A: Predicting both temperature and pressure of a cooker using one model.
Section 2: Transfer Learning
9. Q: What is transfer learning in deep learning?
A: Using a pretrained model on a new but related task to save training time and data.
10. Q: What is the advantage of using pretrained convnets?
A: They capture generic patterns like edges, shapes — useful across many tasks.
11. Q: What is ImageNet used for in transfer learning?
A: A massive dataset with 14 million images used to pretrain CNNs like VGG16.
1
Deep Learning | Lecture 5 | Revision Questions
By: Mohamed Khairi
Section 3: Feature Extraction (Without Data Augmentation)
12. Q: What is feature extraction in CNNs?
A: Using the convolutional base of a pretrained model to extract useful features, then
training a new classifier on top.
13. Q: Why is the dense layer not reused during feature extraction?
A: Because it’s task-specific, while the conv base captures general features.
14. Q: What’s the outcome of feature extraction without data augmentation?
A: ~90% accuracy, but high risk of overfitting since data is not varied.
Section 4: Feature Extraction with Data Augmentation
15. Q: What is the benefit of combining feature extraction with data augmentation?
A: Improves accuracy (~96%) and reduces overfitting.
16. Q: Why must a GPU be used for this method?
A: Because it runs the full model end-to-end with large data transformations.
17. Q: What happens to conv_base.trainable = False?
A: It freezes the pretrained layers during training to prevent changing learned features.
Section 5: Fine-Tuning a Pretrained Network
18. Q: What is fine-tuning in deep learning?
A: Unfreezing some top layers of a pretrained model and retraining them with a new task.
19. Q: When should you fine-tune?
A: After training the added classifier, to adjust the high-level features without destroying
low-level ones.
20. Q: Why is it risky to fine-tune early layers?
A: Because they capture basic, general features, and tuning them increases risk of
overfitting.
2
Deep Learning | Lecture 5 | Revision Questions
By: Mohamed Khairi
Section 6: Visualization of CNNs
21. Q: Why are CNNs not complete black boxes?
A: Their intermediate filters and activations can be visualized to understand what features
they learn.
22. Q: What is the purpose of visualizing intermediate outputs?
A: To observe how input changes through each layer and what patterns are extracted.
23. Q: What do deeper filters in CNNs learn?
A: High-level patterns like "cat ear" or "dog face" — more abstract, less pixel-level info.
24. Q: What are activation heatmaps used for?
A: To see which image areas most influenced the model’s prediction.
25. Q: What did the elephant heatmap example show?
A: The network detected African elephant ears as a key visual feature for prediction.
Section 7: Hyperparameter Tuning
26. Q: What are hyperparameters in deep learning?
A: Settings like learning rate, batch size, dropout rate, and number of layers — set
manually before training.
27. Q: What is manual hyperparameter tuning?
A: Trying different combinations manually to find the best settings.
28. Q: What are the pros and cons of manual tuning?
A: Pros: full control.
**Cons: time-consuming, hard to track experiments.
29. Q: What is grid search vs. random search?
A: Grid tries all combinations systematically, while random picks combinations
randomly — often faster and better for DL.
30. Q: What is automated hyperparameter tuning?
A: Using algorithms to explore combinations and choose the best (e.g., Keras Tuner,
Optuna, Hyperopt).
3
Deep Learning | Lecture 5 | Revision Questions
By: Mohamed Khairi
Section 8: Quantization
31. Q: What is quantization in deep learning?
A: Reducing the numerical precision of a model (e.g., 32-bit float → 8-bit integer).
32. Q: What are benefits of quantization?
A: Reduces model size, speeds up inference, enables AI on mobile/edge devices.
33. Q: What is Post-Training Quantization (PTQ)?
A: Quantize a fully trained model afterward, without retraining. Quick, but may lose
some accuracy.
34. Q: What is Quantization-Aware Training (QAT)?
A: Simulates quantization during training, so the model learns to adjust to low-precision
— less accuracy loss than PTQ.
35. Q: When is QAT better than PTQ?
A: When preserving accuracy is critical, such as for medical or autonomous systems.
Section 9: Knowledge Distillation
36. Q: What is knowledge distillation?
A: Transferring knowledge from a large model (teacher) to a smaller one (student).
37. Q: Why use distillation?
A: To reduce model size while keeping high accuracy — useful for deployment on low-
resource devices.
38. Q: What are the types of distillation?
A: Logit-based, feature-based, and response-based distillation.
39. Q: Give an example of successful distillation.
A: DistilBERT is 40% smaller than BERT but retains 97% of its accuracy.
4
Deep Learning | Lecture 5 | Revision Questions
By: Mohamed Khairi
Section 10: Combining Quantization and Distillation
40. Q: Why combine quantization and distillation?
A: Quantization makes the model smaller; distillation helps recover any lost accuracy —
together, they create fast and smart models.
41. Q: What are real-world applications of quantized-distilled models?
A: Used in Siri, Google Assistant, self-driving cars, wearables, drones, and smart
cameras.
Section 11: Key Takeaways
42. Q: What are the key benefits of quantization and distillation?
A: Smaller models, faster inference, lower power use — ideal for real-time AI on edge
devices.
43. Q: What is the future trend in AI model design?
A: Focusing on efficient models rather than just bigger ones — more performance for
less power.