This type of image classification is called semantic image segmentation. It's similar to object detection in that both ask the question: "What objects are in this image and where in the image are those objects located?," but where object detection labels objects with bounding boxes that may include pixels that aren't part of the object, semantic image segmentation allows you to predict a precise mask for each object in the image by labeling each pixel in the image with its corresponding class. The word “semantic” here refers to what's being shown, so for example the “Car” class is indicated below by the dark blue mask, and "Person" is indicated with a red mask:
Figure 1: Example of a segmented imageAs you might imagine, region-specific labeling is a pretty crucial consideration for self-driving cars, which require a pixel-perfect understanding of their environment so they can change lanes and avoid other cars, or any number of traffic obstacles that can put peoples' lives in danger.
Key Take-overs of the project:
- Building a U-Net
- Explain the difference between a regular CNN and a U-net
- Training the model using Adam optimizer on the CARLA self-driving car data-set of 1000+ images
- Implementing semantic image segmentation on the CARLA self-driving car dataset
- Applying sparse categorical crossentropy for pixelwise prediction
Outputs using 40 epochs:




