Stars
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
🔥 [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
分享一些好用的 Dify DSL 工作流程,自用、学习两相宜。 Sharing some Dify workflows.
Production-ready platform for agentic workflow development.
Official inference repo for FLUX.1 models
🐙 Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
Official code for Forensics Adapter (CVPR'25).
A lightweight LMM-based Document Parsing Model
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
多功能多引擎OCR文字识别、翻译、朗读、语音合成、日漫游戏机翻汉化、验证码识别、图床上传、以图搜图、扫码工具
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
Align Anything: Training All-modality Model with Feedback
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
PDF解析工具:GOT的vLLM加速实现,MinerU做布局识别裁剪、GOT做表格公式解析,实现RAG中的pdf解析
A high-throughput and memory-efficient inference and serving engine for LLMs
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
⚡️ Fast, ultra-accurate text extraction from any image or PDF—including challenging ones—with structured markdown output powered by vision models.
[CVPR2023] Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution