Clean-UI for Multi-Modal Vision Models

This project offers a user-friendly interface for working with the Llama-3.2-11B-Vision and Molmo-7B-D models.

In this case, both the Llama-3.2-11B-Vision-bnb-4bit and Molmo-7B-D-bnb-4bit models need 12GB of VRAM to run.

The code is tested and runs on Ubuntu 22.04.5 / Python 3.10.12

The model selection is done via the command line:

Installation

To set up and run this project on your local machine, follow the steps below:

1. Clone the Repository

Copy the repository to a convenient location on your computer:

git clone <repository-url>
cd <repository-directory>

2. Create a Virtual Environment

Inside the cloned repository, create a virtual environment using the following command:

python -m venv venv-ui

3. Activate the Virtual Environment

Activate the virtual environment using:

.\venv-ui\Scripts\activate

4. Install Dependencies

After activating the virtual environment, install the necessary dependencies from requirements.txt:

pip install -r requirements.txt

Install Torch and TorchVision using separate commands:

pip install torch==2.4.1+cu121 --index-url https://download.pytorch.org/whl/cu121

and

pip install torchvision==0.19.1+cu121 --index-url https://download.pytorch.org/whl/cu121

Usage

To start the UI, you can either:

Use the run.bat script (Windows only)

Simply double-click on run.bat or
1. Activate the virtual environment:
  - Windows:
```
.\venv-ui\Scripts\activate
```
2. Run the Python script:
```
python clean-ui.py
```

Client

You can use the gradio client to programatically script prompts and retreive JSON files with descriptions. For example using the following command will retreive descriptions for the two images in the img/ subdirectory

python3 client.py img/preview.png img/selection.png

Features

Upload an image and enter a prompt to generate an image description.
Adjustable parameters such as temperature, top-k, and top-p for more control over the generated text.
Chatbot history to display prompt-response interactions.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
docker		docker
img		img
scripts		scripts
LICENSE		LICENSE
README.md		README.md
clean-ui.py		clean-ui.py
client.py		client.py
installDeepSeekVL2.sh		installDeepSeekVL2.sh
requirements.txt		requirements.txt
run.bat		run.bat
webcam.py		webcam.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clean-UI for Multi-Modal Vision Models

Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Activate the Virtual Environment

4. Install Dependencies

Usage

Client

Features

License

About

Uh oh!

Releases

Packages

Languages

License

AmmarkoV/clean-ui

Folders and files

Latest commit

History

Repository files navigation

Clean-UI for Multi-Modal Vision Models

Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Activate the Virtual Environment

4. Install Dependencies

Usage

Client

Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages