Lists (3)
Sort Name ascending (A-Z)
Stars
Demonstration for the Lightricks LTX-2 Distilled model, enhanced with specialized LoRA adapters for cinematic camera movements (dolly left/right/in/out, jib up/down, static). Generates animated vid…
Demonstration for the Qwen/Qwen-Image-Edit-2511 model, specialized in object manipulation via lazy-loaded LoRA adapters. Supports adding or removing specific elements (e.g., logos, accessories, clo…
Demonstration for the Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters for advanced single- and multi-image editing. Supports 7 specialized LoRAs including photo-to-anime, multi-angle came…
A Gradio app with Rerun visualization for Microsoft's TRELLIS.2-4B model that generates textured 3D assets (GLB) from text or images using a two-stage pipeline: text-to-image (Z-Image-Turbo) then i…
Experimental demonstration for the Qwen/Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters supporting multi-image input editing. Users can upload one or more images (gallery format) and appl…
Experimental demonstration for the Qwen/Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters for single-image editing tasks. Features specialized edits like photo-to-anime conversion and multi…
Demonstration for NVIDIA's Nemotron-Parse-v1.1 model, designed for advanced document parsing and OCR. Upload images of documents (e.g., papers, forms) to extract structured content: text, tables (L…
Demonstration for the Qwen/Qwen-Image-Edit-2509 model, featuring lazy-loaded LoRA adapters for fast, specialized image edits like photo-to-anime conversion, angle changes, lighting restoration, ski…
Demonstration for the Qwen/Qwen-Image-Edit-2509 model, enhanced with lazy-loaded LoRA adapters for specialized image editing tasks like texture application, object fusion, material transfer, and li…
A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Des…
TRELLIS.2-Text-to-3D is an end-to-end Text-to-3D and Image-to-3D generation app that enables users to create high-quality 3D GLB assets either by generating an image from a text prompt or by upload…
A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA, multi-image pointing, video QA, and temporal tracking. Users upload images or videos, provide natural lan…
A Gradio-based demonstration for the Tongyi-MAI/Z-Image-Turbo diffusion pipeline, enhanced with a curated collection of LoRAs (Low-Rank Adaptations) for style transfer and creative image generation…
A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, specialized in GUI element localization. Users upload UI screenshots, provide task instructions (e.g., "Click on t…
A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, a Computer Use Agent (CUA) specialized in GUI understanding and tool-calling actions.
Demo: Herculis-CUA-GUI-Actioner-4B is a Computer Use Agent (CUA) multimodal model designed for GUI understanding, UI localization, and action execution across web, desktop, and mobile environments
Mergekit supports various architectures like Llama, Mistral, and more models, enabling merges on CPU or GPU with low memory needs through lazy tensor loading and out-of-core processing. It handles …
A Gradio-based demonstration for Computer Use Agent (CUA) tasks, supporting multiple vision-language models: Microsoft Fara-7B, ByteDance UI-TARS-1.5-7B, Hcompany Holo2-4B, and Uniphore ActIO-UI-7B…
A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Clic…
A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Re…
A Gradio-based demonstration application for the Tencent HunyuanOCR model, focused on optical character recognition (OCR) tasks such as text detection, extraction, and coordinate formatting from im…
A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.
A production-ready tool for libraries to retrieve digital copies from Google Books.
A web-based sketching application that allows users to draw or sketch ideas on a canvas and transform them into generated images using Google's Gemini AI models.
Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen tea…
SAM3 Image Segmentation is a user-friendly web application built with Gradio that leverages the Segment Anything Model 3 (SAM3) from Meta AI to perform zero-shot instance segmentation on images usi…
Qwen3-VL-4B-Instruct model from Alibaba's Qwen series for multimodal tasks involving images and text. It enables users to upload an image and perform various vision-language tasks, such as querying…
This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.
Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless ima…