-
@ntu_aiailab
- Room 542, CSIE Building, National Taiwan University No. 1, Sec. 4, Roosevelt Road, Da’an Dist.
Stars
🔉 Play and Record Sound with Python 🐍
Building a inclusive, scalable, and high-performance multilingual translation model
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
A Corpus of Southern Min Dialect for Automatic Speech Recognition
https://deep-learning-101.github.io/Speech-Processing Speech Processing (語音處理)
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…
Python Interface for the Popular mermaid-js Library, Simplified for Diagram Creation
This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (Mandarin) and Taiwanese Hokkien languages. It includes tools and scripts for data preprocessing, mode…
This Python script to detect and decode QR codes in real-time from a live webcam feed. It is a handy tool for instant QR code scanning applications, such as inventory management and digital ticketing.
CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTor…
Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…
real time face swap and one-click video deepfake with only a single image
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A python package to build AI-powered real-time audio applications
Application for viewing Rich Transcription Time Marked (RTTM) files in an interactive way
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
A nearly-live implementation of OpenAI's Whisper.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.