-
Istanbul Technical University
- Istanbul
-
00:45
(UTC +03:00)
Stars
An implementation of the recently introduced Tversky Neural Networks
An open source implementation of CLIP.
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
PyTorch implementation of VQ-VAE by Aäron van den Oord et al.
Evaluate your LLM's response with Prometheus and GPT4 💯
The official repo of "On the Perception Bottleneck of VLMs for Chart Understanding"
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
[NAACL 2025] Beyond End-to-End VLMs: Leveraging Intermediate Text Representations for Superior Flowchart Understanding
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
This repository contains demos I made with the Transformers library by HuggingFace.
An easy way to extract information from documents
A curated list of resources for Document Understanding (DU) topic
Framework agnostic sliced/tiled inference + interactive ui + error analysis plots
A collection of AWESOME language modeling techniques on tabular data applications.
We collect papers about "large language models (LLM) for table-related tasks", e.g., using LLM for Table QA task. “表格+LLM”相关论文整理
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
🦜🔗 Build context-aware reasoning applications