OmniParser API

FastAPI server wrapper for the OmniParser Image-to-text model.

How it works

OmniParser is a screen parsing tool that converts GUI screens into structured elements. It uses computer vision models to:

Detect UI elements and icons
Extract text using OCR
Generate descriptions for visual elements

📢 [Project Page] [V2 Blog Post] [Models V2] [Models V1.5] [HuggingFace Space Demo]

Local

Clone the repository
Build the Docker image:

docker build -t omni-parser .

Run the Docker container:

docker run -p 1337:1337 -e API_KEY=your_secret_key omni-parser

Access the API at http://localhost:1337

News

[2025/2] We release OmniParser V2 checkpoints. Watch Video
[2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use. Watch Video
[2025/1] V2 is coming. We achieve new state of the art results 39.5% on the new grounding benchmark Screen Spot Pro with OmniParser v2 (will be released soon)! Read more details here.
[2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb.
[2024/10] OmniParser was the #1 trending model on huggingface model hub (starting 10/29/2024).
[2024/10] Feel free to checkout our demo on huggingface space! (stay tuned for OmniParser + Claude Computer Use)
[2024/10] Both Interactive Region Detection Model and Icon functional description model are released! Hugginface models
[2024/09] OmniParser achieves the best performance on Windows Agent Arena!

Install

First clone the repo, and then install environment:

cd OmniParser
conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt

Ensure you have the V2 weights downloaded in weights folder (ensure caption weights folder is called icon_caption_florence). If not download them with:

   # download the model checkpoints to local directory OmniParser/weights/
   for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
   mv weights/icon_caption weights/icon_caption_florence

License & Attribution

This project is a wrapper around Microsoft's OmniParser. Please note the following licenses:

Original OmniParser is under CC-BY-4.0 license
Icon detection model (YOLO-based) is under AGPL license
Icon caption models (BLIP2 & Florence) are under MIT license

Gradio Demo

To run gradio demo, simply run:

python gradio_demo.py

Model Weights License

For the model checkpoints on huggingface model hub, please note that icon_detect model is under AGPL license since it is a license inherited from the original yolo model. And icon_caption_blip2 & icon_caption_florence is under MIT license. Please refer to the LICENSE file in the folder of each model: https://huggingface.co/microsoft/OmniParser.

📚 Citation

Our technical report can be found here. If you find our work useful, please consider citing our work:

@misc{lu2024omniparserpurevisionbased,
      title={OmniParser for Pure Vision Based GUI Agent}, 
      author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
      year={2024},
      eprint={2408.00203},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00203}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
docs		docs
eval		eval
imgs		imgs
omnitool		omnitool
util		util
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.runpod		Dockerfile.runpod
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
api.py		api.py
demo.ipynb		demo.ipynb
download.py		download.py
entrypoint.sh		entrypoint.sh
gradio_demo.py		gradio_demo.py
handler.py		handler.py
requirements.txt		requirements.txt
run.sh		run.sh
stress_test.py		stress_test.py
test.png		test.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

OmniParser API

How it works

Local

News

Install

License & Attribution

Gradio Demo

Model Weights License

📚 Citation

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

Reflow-AI/omniparser-api

Folders and files

Latest commit

History

Repository files navigation

OmniParser API

How it works

Local

News

Install

License & Attribution

Gradio Demo

Model Weights License

📚 Citation

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages