Codestin Search App

Vision Language Model : tailored for tasks that involve [messy] optical character recognition (ocr), image-to-text conversion, and math problem solving with latex formatting.

pillow video-processing opencv-python video-understanding ocr-recognition ocr-python huggingface-transformers qwen2-vl-2b qwen2-5-vl monkey-ocr

Updated Jul 26, 2025
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast

Star

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.

python kernel numpy torch pytorch peft torchvision diffusion-models huggingface-transformers huggingface-spaces diffusers flash-attention-3 qwen2-5-vl qwen-image-edit qwen3-vl qwen-image-edit-2509 aoti

Updated Dec 23, 2025
Python

zhangguanghao523 / CMMCoT

Star

Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl qwen2-5-vl

Updated Dec 5, 2025
Python

cilabuniba / artseek

Star

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval

computer-vision deep-learning multimodal-learning multimodal vision-language large-language-models llm mllm multimodal-large-language-models retrieval-augmented-generation qwen qwen2-5 qwen2-5-vl

Updated Aug 6, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Super-OCRs-Demo

Star

A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.

python ocr pillow torch accelerate supervision gradio opencv-python nanonets torchvision sentencepiece huggingface-transformers huggingface-spaces flash-attention-2 hunyuan qwen2-5-vl dots-ocr deepseek-ocr easydict

Updated Nov 28, 2025
Python

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

Star

Qwen3-VL-Outpost is a Gradio-based web application for vision-language tasks, leveraging multiple Qwen vision-language models to process images and videos.

torch gradio opencv-python video-understanding huggingface-transformers huggingface-spaces vision-language-model qwen2-vl qwen2-5-vl qwen3-vl

Updated Nov 1, 2025
Python

smsk-01 / GRPO-Trainer-Images

Star

GRPO trainer for VLM

images grpo qwen2-5-vl grpovlm grpoimages

Updated Oct 8, 2025
Python

PRITHIVSAKTHIUR / SAGE-MM-Video-Reasoning

Star

A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.

torch accelerate gradio opencv-python torchvision huggingface-transformers decord video-reasoning huggingface-spaces qwen2-5-vl qwen3-vl molmo2

Updated Dec 21, 2025
Python

PRITHIVSAKTHIUR / Multimodal-OCR3

Star

Multimodal-OCR3 is an advanced Optical Character Recognition (OCR) application that leverages multiple state-of-the-art multimodal models to extract text from images.

ocr pillow pytorch matplotlib ocr-recognition nanonets inference-optimization huggingface-transformers vision-transformer huggingface-models sota-model huggingface-spaces vision-language-model multimodal-large-language-models qwen2-5-vl qwen3-vl chandra-ocr dotsocr olmocr2

Updated Nov 11, 2025
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

Star

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen team. It leverages specialized LoRA adapters for efficient, low-step inference (as few as 4 steps).

Updated Dec 12, 2025
Python

PRITHIVSAKTHIUR / Qwen-3VL-Multimodal-Understanding

Star

Qwen3-VL-4B-Instruct model from Alibaba's Qwen series for multimodal tasks involving images and text. It enables users to upload an image and perform various vision-language tasks, such as querying details, generating captions, detecting points of interest.

torch pytorch pip accelerate supervision gradio multimodal torchvision huggingface-transformers roboflow huggingface-spaces vision-language-model pillow-library llama-cpp qwen2-5-vl qwen3-vl

Updated Nov 18, 2025
Python

Kathan-max / RAG-Enhanced-Chatbot-with-LoRA-Fine-Tuning

Star

Transform your documents into intelligent conversations. This open-source RAG chatbot combines semantic search with fine-tuned language models (LLaMA, Qwen2.5VL-3B) to deliver accurate, context-aware responses from your own knowledge base. Join our community!

Updated Aug 13, 2025
Python

PRITHIVSAKTHIUR / Tiny-VLMs-Lab

Star

Tiny VLMs Lab is a Hugging Face Space and open-source project showcasing lightweight Vision-Language Models for image captioning, OCR, reasoning, and multimodal understanding. It offers a simple Gradio interface to upload images, query models, adjust generation settings, and export results in Markdown or PDF.

ocr cuda optical-character-recognition gradio multimodality captioning-images huggingface-transformers vision-transformer hugging-face huggingface-spaces vision-language-model flash-attention-2 vlms qwen2-5-vl

Updated Nov 26, 2025
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-5-vl

Here are 49 public repositories matching this topic...

2U1 / Qwen-VL-Series-Finetune

sophgo / LLM-TPU

thaoshibe / relsim

yuanc3 / DATE

PRITHIVSAKTHIUR / OCR-ReportLab-Notebooks

liuyifan22 / Qwen2.5-VL-Batched

o-l-l-i / simple-captioner

PRITHIVSAKTHIUR / Multimodal-OCR

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast

zhangguanghao523 / CMMCoT

cilabuniba / artseek

PRITHIVSAKTHIUR / Super-OCRs-Demo

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

smsk-01 / GRPO-Trainer-Images

PRITHIVSAKTHIUR / SAGE-MM-Video-Reasoning

PRITHIVSAKTHIUR / Multimodal-OCR3

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

PRITHIVSAKTHIUR / Qwen-3VL-Multimodal-Understanding

Kathan-max / RAG-Enhanced-Chatbot-with-LoRA-Fine-Tuning

PRITHIVSAKTHIUR / Tiny-VLMs-Lab

Improve this page

Add this topic to your repo