Stars
A collection of unusual mesh processing algorithms.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.
Fast mesh to 3D gaussian splat conversion
A Modular Framework for 3D Generation and Beyond [WIP]
[CVPR 2022] Neural Face Identification in a 2D Wireframe Projection of a Manifold Object
[CVPR 2025] TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
[ICCV 2025] Official impl. of "MV-Adapter: Multi-view Consistent Image Generation Made Easy"
[ICCVW 2025] Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation
[ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)
[NeurIPS 2024] Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer
[ICLR 2025] From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
This is the official release for the paper "EFM3D A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models" (https//arxiv.org/abs/2406.10224).
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompts. And it's also powered by additional prompt refining featu…
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).
Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
[ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
A Unified Toolkit for Deep Learning Based Document Image Analysis