Stars
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
[ECCV 2024] codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
Turn any face into a video game character, pixel art, claymation, 3D or toy
ControlNet collections for Flux1-dev model, Trained by TheMisto.ai Team
Images to inference with no labeling (use foundation models to train supervised models).
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Official repository for CVPR2022 publication, ViM: Out-Of-Distribution with Virtual-logit Matching
[CVPR 2023] The official implementation of CVPR 2023 paper "Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes"
iCartoonFace dataset, and baseline approaches, the project is supported by iQIYI
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Mapping of Imagenet and Wikidata for Knowledge Graphs Enabled Computer Vision
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
[NeurIPS 2023] This repository includes the official implementation of our paper "An Inverse Scaling Law for CLIP Training"
Keras implementation of the Yahoo Open-NSFW model
Refine high-quality datasets and visual AI models
Train transformer language models with reinforcement learning.
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.
闻达:一个LLM调用平台。目标为针对特定环境的高效内容生成,同时考虑个人和中小企业的计算资源局限性,以及知识安全和私密性问题
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Real-time face swap for PC streaming or video calls
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…