Image Captioning Project

An end-to-end image captioning project using Blip

📋 Project Overview

A pipeline for generating descriptive captions from images. The goal is to deliver accurate, human-like captions using deep learning architectures, while navigating resource constraints and optimizing performance.

🚩 Problem Statement

Resource bottleneck: Limited GPU memory and compute caused extremely long processing times and frequent kernel crashes.
Overfitting challenges: Initial CNN–LSTM and Transformer models struggled to generalize.
Caption length gap: Pre-trained BLIP generated captions of only 7–8 words, whereas the test set captions average 17–18 words.

🗄️ Dataset Handling

Original dataset: Full image set.
Subsetting strategy: Divided the dataset into smaller subsets and processed in batches to reduce memory footprint and prevent crashes.
Batching: Implemented batched data loading and processing to optimize runtime and keep kernel stable.

🛠️ Model Architectures & Experiments

CNN–LSTM
- Built end-to-end pipeline for visual feature extraction (CNN) and sequence generation (LSTM).
- Outcome: High overfitting; training loss decreased but validation performance stalled.
Transformer
- Replaced recurrent layers with self-attention blocks.
- Outcome: Training stability improved, but final captions did not meet quality or length expectations.
BLIP (Bootstrapping Language–Image Pre-training)
- Applied pre-trained BLIP out of the box.
- Without fine-tuning: Produced short captions (7–8 words).
- With fine-tuning: Achieved convincing and contextually rich captions matching average test-set length (~17–18 words).

🔮 Future Improvements

Data augmentation: Apply image blurring or other augmentations to enrich training signals.
Expand subset size: Increasing the subset for fine-tuning may enhance caption detail and diversity.
Hybrid models: Explore combining CNN–LSTM with attention mechanisms to balance overfitting and performance.
Hyperparameter tuning: Systematic grid search on learning rate, batch size, and architecture depth.

Readme generated to document key decisions and streamline future development.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
img-capt (1).ipynb		img-capt (1).ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning Project

📋 Project Overview

🚩 Problem Statement

🗄️ Dataset Handling

🛠️ Model Architectures & Experiments

🔮 Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Image Captioning Project

📋 Project Overview

🚩 Problem Statement

🗄️ Dataset Handling

🛠️ Model Architectures & Experiments

🔮 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages