qwen2-vl

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

optical-character-recognition visual-question-answering qwen2-vl

Updated Oct 18, 2024
Jupyter Notebook

see2023 / autoXHS

Star

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

Updated Nov 6, 2024
Python

ZachcZhang / Qwen2-VL-inference

Star

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

Updated Nov 20, 2024
Python

tatsuya-fukuoka / Qwen2-VL-demo

Star

Qwen2-VLのデモNotebook

vlm qwen2-vl

Updated Nov 20, 2024
Jupyter Notebook

KhadgaA / Amazon-ML-Challenge

Star

This repo contains the winning code for Amazon ML Challenge 2024. The challenge was to develop a Machine Learning model to extract product entity details directly from the product images.

computer-vision vqa visual-question-answering amazon-ml-challenge vision-language-model llama-factory qwen2-vl

Updated Nov 30, 2024
Python

Yatish54321 / Flipkart_Grid_6.0_Robotics_level2_model

Star

"Smart Vision Technology for Quality Control" uses computer vision to automate product inspections, extracting details like product name, quantity, expiry date, and freshness from images. Built for Flipkart Grid 6.0, it enhances accuracy and efficiency in quality control, minimizing manual checks.

huggingface-transformers genai qwen2-vl qwen2-vl-2b

Updated Dec 4, 2024
Jupyter Notebook

Valdanitooooo / chat_with_qwen2_vl_test

Star

qwen2-vl

Updated Dec 27, 2024
Python

Pavansomisetty21 / Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-

Sponsor

Star

we finetune unsloth llama model to extract mathematical fomulas in the images with optical character recognition(OCR)

ocr llama maths optical-character-recognition vlm ocr-recognition llm vision-language-model qwen2 unsloth qwen2-vl

Updated Jan 8, 2025
Jupyter Notebook

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

Star

This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.

transformers natural-language-queries semantic-search pdf-processing image-indexing multimodal-analysis blip2 retrieval-augmented-generation visual-language-models qwen2-vl

Updated Jan 11, 2025
Jupyter Notebook

SimonGino / repoicon

Sponsor

Star

使用 AI 为你的 GitHub 仓库生成精美的极简图标。

icon-generator llm qwen2-vl

Updated Feb 5, 2025
TypeScript

851543 / Qwen2.5-VL-Server

Star

Qwen2.5-VL-Server

python3 pytorch fastapi qwen2-vl

Updated Mar 9, 2025
Python

aws-samples / multi-modal-examples-for-amazon-sagemaker

Star

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

sagemaker multi-modality sagemaker-example sagemaker-studio llm vllm video-llava internvl2 qwen2-vl

Updated Mar 16, 2025
Jupyter Notebook

anusha-chebolu / multimodal-rag

Star

A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.

rag mutimodal qdrant-vector-database colpali qwen2-vl

Updated Mar 20, 2025
Jupyter Notebook

PRITHIVSAKTHIUR / Aya-Vision-Ocr-vs-Qwen2VL-Ocr

Star

Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B

ocr image-to-text huggingface-transformers vision-transformer qwen2-vl aya-vision

Updated Mar 22, 2025
Python

polymathbenchmark / polymathbenchmark.github.io

Star

A Challenging Multi-Modal Mathematical Reasoning Benchmark

benchmark vision multimodal gpt-4 llm gemini-vision-pro llama3 claude-3-5-sonnet qwen2-vl openai-o1

Updated Apr 13, 2025
JavaScript

Dishu-Bansal / Documatic

Star

A AI- Powered Document organizer tool. It displays a small cute robot on the screen. Give it any file and a small description (optional), It will analyse the contents and description and save it on cloud. When needed, just double click on it, enter the description/keywords for the file you are looking for, It will open the best matching file/

machine-learning deep-learning neural-network artificial-intelligence ai-engineering llm qwen2-vl

Updated Apr 15, 2025
C

Improve this page

Add a description, image, and links to the qwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-vl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-vl

Here are 55 public repositories matching this topic...

soulteary / dify-with-qwen-vl

Kazuhito00 / Qwen2-VL-Colaboratory-Sample

BUAADreamer / Qwen2-VL-History

silvererudite / generative-ai

shaadclt / Qwen2-VL-OCR-VQA

see2023 / autoXHS

ZachcZhang / Qwen2-VL-inference

tatsuya-fukuoka / Qwen2-VL-demo

KhadgaA / Amazon-ML-Challenge

Yatish54321 / Flipkart_Grid_6.0_Robotics_level2_model

Valdanitooooo / chat_with_qwen2_vl_test

Pavansomisetty21 / Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

SimonGino / repoicon

851543 / Qwen2.5-VL-Server

aws-samples / multi-modal-examples-for-amazon-sagemaker

anusha-chebolu / multimodal-rag

PRITHIVSAKTHIUR / Aya-Vision-Ocr-vs-Qwen2VL-Ocr

polymathbenchmark / polymathbenchmark.github.io

Dishu-Bansal / Documatic

Improve this page

Add this topic to your repo