Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AmmarkoV/clean-ui

 
 

Repository files navigation

Clean-UI for Multi-Modal Vision Models

This project offers a user-friendly interface for working with the Llama-3.2-11B-Vision and Molmo-7B-D models.

In this case, both the Llama-3.2-11B-Vision-bnb-4bit and Molmo-7B-D-bnb-4bit models need 12GB of VRAM to run.

The code is tested and runs on Ubuntu 22.04.5 / Python 3.10.12

The model selection is done via the command line:

Installation

To set up and run this project on your local machine, follow the steps below:

1. Clone the Repository

Copy the repository to a convenient location on your computer:

git clone <repository-url>
cd <repository-directory>

2. Create a Virtual Environment

Inside the cloned repository, create a virtual environment using the following command:

python -m venv venv-ui

3. Activate the Virtual Environment

Activate the virtual environment using:

.\venv-ui\Scripts\activate

4. Install Dependencies

After activating the virtual environment, install the necessary dependencies from requirements.txt:

pip install -r requirements.txt

Install Torch and TorchVision using separate commands:

pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

and

pip install torchvision==0.19.1+cu121 --index-url https://download.pytorch.org/whl/cu121

Usage

To start the UI, you can either:

  • Use the run.bat script (Windows only)

    Simply double-click on run.bat or

    1. Activate the virtual environment:

      • Windows:
        .\venv-ui\Scripts\activate
    2. Run the Python script:

      python clean-ui.py

Client

You can use the gradio client to programatically script prompts and retreive JSON files with descriptions. For example using the following command will retreive descriptions for the two images in the img/ subdirectory

python3 client.py img/preview.png img/selection.png

Features

  • Upload an image and enter a prompt to generate an image description.
  • Adjustable parameters such as temperature, top-k, and top-p for more control over the generated text.
  • Chatbot history to display prompt-response interactions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

About

Simple UI for Llama-3.2-11B-Vision & Molmo-7B-D

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 76.5%
  • Shell 14.2%
  • Dockerfile 8.6%
  • Batchfile 0.7%