This project implements a text-to-image generation model using a pre-trained CLIP model for text encoding and a simple U-Net architecture for image generation. The model learns to generate images based on textual descriptions by leveraging diffusion-based denoising techniques.
- Introduction
- Dataset
- Preprocessing
- Model Architecture
- Training
- Generating Images
- Installation
- Usage
- Results
- Contact
This project utilizes:
- CLIP Model: To encode text into feature vectors.
- U-Net Architecture: To generate images by first up-sampling and then down-sampling the input.
- Diffusion Model: To progressively refine the generated image over multiple iterations.
The dataset consists of images labeled with different text descriptions of geometric shapes in different arrangements. The images are stored in DATA_DIR, and each label represents a different spatial configuration of shapes.
To access the dataset, email me at [email protected].
The dataset is preprocessed by:
- Loading images and filtering non-image files.
- Applying transformations such as resizing, flipping, and normalizing pixel values.
- Converting images to tensors for PyTorch processing.
The model consists of:
- Text Encoder: A linear layer projecting CLIP-generated text features.
- U-Net Architecture: A simplified U-Net with residual connections to preserve image details.
- Time Embedding: To incorporate timestep information into the model.
- Optimizer: Adam optimizer with a learning rate of
0.001. - Loss Function: L1 loss between predicted and actual noise in diffusion.
- Training Duration: 100 epochs with periodic loss logging.
To generate images from text:
text = "square on top of circle"
generate_text_to_image_samples(text)This runs the model to iteratively refine an image based on the input text prompt.
- Python 3.x
- PyTorch
- NumPy
- Matplotlib
- Pillow (PIL)
- torchvision
- OpenAI CLIP
pip install torch torchvision numpy matplotlib pillow clip-by-openai- Clone the repository:
git clone <repo_link>
cd <repo_folder>- Update
DATA_DIRwith the path to your dataset. - Run the training script:
python train.py- Generate images:
python generate.pyThe model progressively generates better images over 100 epochs. Sample results can be visualized using:
sample_plot_image()For dataset access or inquiries, email me at:
📧 [email protected]