Stars
Automate your mobile devices with natural language commands - an LLM agnostic mobile Agent 🤖
💫 Toolkit to help you get started with Spec-Driven Development
A Model Context Protocol (MCP) server that provides structured spec-driven development workflow tools for AI-assisted software development, featuring a real-time web dashboard and VSCode extension …
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
WACV2024 - Scale-Adaptive Feature Aggregation for Efficient Space-Time Video Super-Resolution
BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation, ECCV 2020
Official source code for our paper "AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation" (CVPR 2020)
ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation
🎯 Production-ready implementation of video prediction models using PyTorch. Features Enhanced ConvLSTM with temporal attention, PredRNN with spatiotemporal memory, and Transformer-based architecture.
AndroidWorld is an environment and benchmark for autonomous agents
An artificial intelligence platform for the StarCraft II with large-scale distributed training and grand-master agents.
(JAIR'2022) A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II. JAIR = Journal of Artificial Intelligence Rese…
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, includi…
Video Quality Assessment using Deep Learning (CNN + LSTM)
Owl Eyes: Spotting UI Display Issues via Visual Understanding
🔥 Latest advances in Video Object Segmentation (VOS) – papers, datasets, and projects.
一款在线的 微信公众号文章批量下载 工具,支持导出阅读量与评论数据,无需搭建任何环境,可通过 在线网站 使用,支持 docker 私有化部署和 Cloudflare 部署。 支持下载各种文件格式,其中 HTML 格式可100%还原文章排版与样式。
A development-oriented visualization toolkit
AgentScope: Agent-Oriented Programming for Building LLM Applications
[ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
An open-source database of AI models.