Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

ShapeGen3D is a project that uses UNET and CLIP to generate 3D shapes like circles, squares, and triangles from textual descriptions. This repository includes the necessary code, models, and documentation to create and visualize these shapes using deep learning techniques.

License

Notifications You must be signed in to change notification settings

jayshah1819/ShapeGen3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Text-to-Image Generation Using CLIP and U-Net

Author: Jay Shah

Date: 04-26-2024

This project implements a text-to-image generation model using a pre-trained CLIP model for text encoding and a simple U-Net architecture for image generation. The model learns to generate images based on textual descriptions by leveraging diffusion-based denoising techniques.


Table of Contents


Introduction

This project utilizes:

  • CLIP Model: To encode text into feature vectors.
  • U-Net Architecture: To generate images by first up-sampling and then down-sampling the input.
  • Diffusion Model: To progressively refine the generated image over multiple iterations.

Dataset

The dataset consists of images labeled with different text descriptions of geometric shapes in different arrangements. The images are stored in DATA_DIR, and each label represents a different spatial configuration of shapes.

To access the dataset, email me at [email protected].


Preprocessing

The dataset is preprocessed by:

  1. Loading images and filtering non-image files.
  2. Applying transformations such as resizing, flipping, and normalizing pixel values.
  3. Converting images to tensors for PyTorch processing.

Model Architecture

The model consists of:

  • Text Encoder: A linear layer projecting CLIP-generated text features.
  • U-Net Architecture: A simplified U-Net with residual connections to preserve image details.
  • Time Embedding: To incorporate timestep information into the model.

Training

  • Optimizer: Adam optimizer with a learning rate of 0.001.
  • Loss Function: L1 loss between predicted and actual noise in diffusion.
  • Training Duration: 100 epochs with periodic loss logging.

Generating Images

To generate images from text:

text = "square on top of circle"
generate_text_to_image_samples(text)

This runs the model to iteratively refine an image based on the input text prompt.


Installation

Prerequisites:

  • Python 3.x
  • PyTorch
  • NumPy
  • Matplotlib
  • Pillow (PIL)
  • torchvision
  • OpenAI CLIP

Installation Steps:

pip install torch torchvision numpy matplotlib pillow clip-by-openai

Usage

  1. Clone the repository:
git clone <repo_link>
cd <repo_folder>
  1. Update DATA_DIR with the path to your dataset.
  2. Run the training script:
python train.py
  1. Generate images:
python generate.py

Results

The model progressively generates better images over 100 epochs. Sample results can be visualized using:

sample_plot_image()

Contact

For dataset access or inquiries, email me at:
📧 [email protected]

About

ShapeGen3D is a project that uses UNET and CLIP to generate 3D shapes like circles, squares, and triangles from textual descriptions. This repository includes the necessary code, models, and documentation to create and visualize these shapes using deep learning techniques.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published