-
IBM CDL (currently)
- BJ, China
Stars
a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
A Python package for interacting with the MinerU Vision-Language Model.
Data browser based on s3. 一个基于 S3 的数据(json / jsonl / parquet / html / md等)可视化工具。👇 Try online.
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, dif…
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
SGLang is a high-performance serving framework for large language models and multimodal models.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
A high-throughput and memory-efficient inference and serving engine for LLMs
Production-ready platform for agentic workflow development.
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
[ICCV 2025] The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”
WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。
[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds
Data annotation toolbox supports image, audio and video data.
[AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".
Data annotation component library --provided as NPM packages
Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)