Convolutional Neural Networks (CNNs) - Teaching Demo
How Humans See vs. How Computers See
Humans See:
- Our eyes capture light, and the brain processes it step by step.
- First, we detect edges and colors.
- Then we combine them into shapes.
- Finally, we recognize objects (like “this is a cat”).
Computers See:
- Computers don’t see objects directly — they only see an image as numbers.
- A digital image is stored as a grid of pixel values (e.g., a 28×28 image = 784 numbers).
- Each number shows brightness/color.
- But computers don’t know what patterns mean unless we teach them.
Why We Use CNN
- A raw grid of numbers is too big and complicated. Teaching a computer to understand an
image directly from pixels is very hard.
- CNNs automatically learn patterns step by step — just like humans do.
• Early CNN layers detect edges.
• Middle layers detect shapes or textures.
• Final layers detect full objects.
- This makes CNNs powerful because they mimic the way humans see, but in a mathematical
way.
What We Used Before CNN
- Before CNNs, people used traditional machine learning + manual feature extraction.
- Feature Extraction: Engineers designed features by hand (edges, corners, textures).
Examples: SIFT, HOG, SURF.
- Classifier: These features were then given to classifiers like SVM, kNN, or Decision Trees.
Limitations:
- Required a lot of manual effort.
- Features didn’t generalize well.
- Accuracy was lower compared to modern deep learning.
Interview Demo Script
Humans recognize objects in steps: first we see edges, then shapes, then full objects.
Computers, on the other hand, only see images as grids of numbers called pixels.
Earlier, we used manual methods like SIFT and HOG to extract features, and then applied
classifiers like SVM. But these required a lot of human effort and often failed when the
images changed.
CNNs solve this by automatically learning features from raw pixels, step by step, just like
humans do. That’s why CNNs have become the standard for image recognition tasks.