-
Palo Alto, CA
- United States
-
11:57
(UTC -07:00) - https://lilyzhng.github.io/
- in/lilyzhng
- https://scholar.google.com/citations?user=la-Mx-UAAAAJ&hl=en
Starred repositories
IROS 2025 Workshop Page
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Inference, Fine Tuning and many more recipes with Gemma family of models
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Code for CVPR2025 paper: Generating Multimodal Driving Scenes via Next-Scene Prediction
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
FreeVS: Generative View Synthesis on Free Driving Trajectory
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Starting point for the Women in AI RAG Hackathon, Jan 25 2025
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Synthetic data curation for post-training and structured data extraction
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
LLM powered retrieval engine designed to process a ton of sources to collect a comprehensive list of entities.
[IROS 2023] DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception
(T-IV, ITSC) Auto-labeling of point cloud sequences for 3D object detection using an ensemble of experts and temporal refinement
Fully typed & consistent chat APIs for OpenAI, Anthropic, Groq, and Azure's chat models for browser, edge, and node environments.
Get structured, fully typed, and validated JSON outputs from OpenAI and Anthropic models.
Painter & SegGPT Series: Vision Foundation Models from BAAI
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
An open-source framework for training large multimodal models.