PRITHIVSAKTHIUR

🎯

Focusing

PRITHIV SAKTHI U R PRITHIVSAKTHIUR

🎯

Focusing

Computer Vision, Multimodal AI

123 followers · 0 following

Achievements

Organizations

Lists (3)

Sort

Stars

PRITHIVSAKTHIUR / FLUX.2-klein-LoRA-DLC

Python 1 Updated Jan 21, 2026

PRITHIVSAKTHIUR / LTX-2-LoRAs-Camera-Control-Dolly

Demonstration for the Lightricks LTX-2 Distilled model, enhanced with specialized LoRA adapters for cinematic camera movements (dolly left/right/in/out, jib up/down, static). Generates animated vid…

Python 1 Updated Jan 11, 2026

PRITHIVSAKTHIUR / Qwen-Image-Edit-Object-Manipulator

Demonstration for the Qwen/Qwen-Image-Edit-2511 model, specialized in object manipulation via lazy-loaded LoRA adapters. Supports adding or removing specific elements (e.g., logos, accessories, clo…

Python 1 Updated Jan 5, 2026

PRITHIVSAKTHIUR / Qwen-Image-Edit-2511-LoRAs-Fast-Lazy-Load

Demonstration for the Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters for advanced single- and multi-image editing. Supports 7 specialized LoRAs including photo-to-anime, multi-angle came…

Python 2 Updated Jan 16, 2026

PRITHIVSAKTHIUR / TRELLIS.2-Text-to-3D-RERUN

A Gradio app with Rerun visualization for Microsoft's TRELLIS.2-4B model that generates textured 3D assets (GLB) from text or images using a two-stage pipeline: text-to-image (Z-Image-Turbo) then i…

Python 4 Updated Dec 28, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2511-LoRAs-Fast-Multi-Image-Rerun

Experimental demonstration for the Qwen/Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters supporting multi-image input editing. Users can upload one or more images (gallery format) and appl…

Python 7 2 Updated Dec 27, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2511-LoRAs-Fast-Single-Image-Rerun

Experimental demonstration for the Qwen/Qwen-Image-Edit-2511 model with lazy-loaded LoRA adapters for single-image editing tasks. Features specialized edits like photo-to-anime conversion and multi…

Python 1 Updated Dec 27, 2025

PRITHIVSAKTHIUR / NVIDIA-Nemotron-Parse-OCR

Demonstration for NVIDIA's Nemotron-Parse-v1.1 model, designed for advanced document parsing and OCR. Upload images of documents (e.g., papers, forms) to extract structured content: text, tables (L…

Python 2 Updated Dec 24, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Lazy-Load

Demonstration for the Qwen/Qwen-Image-Edit-2509 model, featuring lazy-loaded LoRA adapters for fast, specialized image edits like photo-to-anime conversion, angle changes, lighting restoration, ski…

Python 1 Updated Dec 23, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion-Lazy-Load

Demonstration for the Qwen/Qwen-Image-Edit-2509 model, enhanced with lazy-loaded LoRA adapters for specialized image editing tasks like texture application, object fusion, material transfer, and li…

Python 1 Updated Dec 22, 2025

PRITHIVSAKTHIUR / SAGE-MM-Video-Reasoning

A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Des…

Python 5 Updated Dec 21, 2025

PRITHIVSAKTHIUR / TRELLIS.2-Text-to-3D

TRELLIS.2-Text-to-3D is an end-to-end Text-to-3D and Image-to-3D generation app that enables users to create high-quality 3D GLB assets either by generating an image from a text prompt or by upload…

Python 1 Updated Dec 22, 2025

PRITHIVSAKTHIUR / Molmo2-HF-Demo

A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA, multi-image pointing, video QA, and temporal tracking. Users upload images or videos, provide natural lan…

Python 3 Updated Dec 24, 2025

PRITHIVSAKTHIUR / Z-Image-Turbo-LoRA-DLC

A Gradio-based demonstration for the Tongyi-MAI/Z-Image-Turbo diffusion pipeline, enhanced with a curated collection of LoRAs (Low-Rank Adaptations) for style transfer and creative image generation…

Python 2 1 Updated Dec 24, 2025

PRITHIVSAKTHIUR / Gliese-CUA-Tool-Call-8B-Localization-Demo

A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, specialized in GUI element localization. Users upload UI screenshots, provide task instructions (e.g., "Click on t…

Python 1 Updated Dec 15, 2025

PRITHIVSAKTHIUR / Gliese-CUA-Tool-Call-8B-Demo

A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, a Computer Use Agent (CUA) specialized in GUI understanding and tool-calling actions.

Python 1 Updated Dec 15, 2025

PRITHIVSAKTHIUR / Herculis-CUA-GUI-Actioner-4B-Demo

Demo: Herculis-CUA-GUI-Actioner-4B is a Computer Use Agent (CUA) multimodal model designed for GUI understanding, UI localization, and action execution across web, desktop, and mobile environments

Python 1 Updated Dec 14, 2025

PRITHIVSAKTHIUR / mergekit-ops

Mergekit supports various architectures like Llama, Mistral, and more models, enabling merges on CPU or GPU with low memory needs through lazy tensor loading and out-of-core processing. It handles …

Python 1 Updated Dec 14, 2025

PRITHIVSAKTHIUR / CUA-GUI-Operator

A Gradio-based demonstration for Computer Use Agent (CUA) tasks, supporting multiple vision-language models: Microsoft Fara-7B, ByteDance UI-TARS-1.5-7B, Hcompany Holo2-4B, and Uniphore ActIO-UI-7B…

Python 3 Updated Dec 24, 2025

PRITHIVSAKTHIUR / Fara-7B-GUI-Operator

A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Clic…

Python 5 Updated Dec 8, 2025

PRITHIVSAKTHIUR / Vision-to-VibeVoice-en

A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Re…

Python 3 1 Updated Dec 8, 2025

PRITHIVSAKTHIUR / HunyuanOCR-Demo

A Gradio-based demonstration application for the Tencent HunyuanOCR model, focused on optical character recognition (OCR) tasks such as text detection, extraction, and coordinate formatting from im…

Python 3 Updated Dec 4, 2025

PRITHIVSAKTHIUR / Super-OCRs-Demo

A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.

Python 9 2 Updated Nov 28, 2025

institutional / grin-transfer

A production-ready tool for libraries to retrieve digital copies from Google Books.

Python 10 1 Updated Jan 5, 2026

PRITHIVSAKTHIUR / Nano-Banana-Pro-Sketch-Board

A web-based sketching application that allows users to draw or sketch ideas on a canvas and transform them into generated images using Google's Gemini AI models.

TypeScript 5 Updated Nov 23, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen tea…

Python 5 3 Updated Dec 12, 2025

PRITHIVSAKTHIUR / SAM3-Image-Segmentation

SAM3 Image Segmentation is a user-friendly web application built with Gradio that leverages the Segment Anything Model 3 (SAM3) from Meta AI to perform zero-shot instance segmentation on images usi…

Python 6 1 Updated Nov 22, 2025

PRITHIVSAKTHIUR / Qwen-3VL-Multimodal-Understanding

Qwen3-VL-4B-Instruct model from Alibaba's Qwen series for multimodal tasks involving images and text. It enables users to upload an image and perform various vision-language tasks, such as querying…

Python 5 Updated Nov 18, 2025

PRITHIVSAKTHIUR / FineTuning-MetaCLIP-2

This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.

Jupyter Notebook 2 Updated Nov 15, 2025

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless ima…

Python 14 2 Updated Dec 23, 2025

PRITHIV SAKTHI U R PRITHIVSAKTHIUR

Organizations

Lists (3)

Chat Bot ℹ️

🔮 Future ideas

Models 🚀

Stars