Stars
MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Undetected Python version of the Playwright testing and automation library.
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
DIAMOND (DIffusion As a Model Of eNvironment Dreams) is a reinforcement learning agent trained in a diffusion world model. NeurIPS 2024 Spotlight.
Hardware-synchronized device for FAST-LIVO (Handheld & UAV).
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
SEED-Voken: A Series of Powerful Visual Tokenizers
A keyboard shortcut browser extension for keyboard-based navigation and tab operations with an advanced omnibar
Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop Streaming Platform for Self-Hosting, Containers, Kubernetes, or Cloud/HPC
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
OCR, layout analysis, reading order, table recognition in 90+ languages
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Utils for streaming large files (S3, HDFS, gzip, bz2...)
SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark, led by Hillbot, Inc.
Code for a series of work in LiDAR perception, including SST (CVPR 22), FSD (NeurIPS 22), FSD++ (TPAMI 23), FSDv2, and CTRL (ICCV 23, oral).
Easily compute clip embeddings and build a clip retrieval system with them
DeepSeek LLM: Let there be answers
Dobb·E: An open-source, general framework for learning household robotic manipulation
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
DeepSeek Coder: Let the Code Write Itself
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Official Repository for "Eureka: Human-Level Reward Design via Coding Large Language Models" (ICLR 2024)
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
✨✨Latest Advances on Multimodal Large Language Models
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.