Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View bigchou's full-sized avatar
  • @ntu_aiailab
  • Room 542, CSIE Building, National Taiwan University No. 1, Sec. 4, Roosevelt Road, Da’an Dist.

Block or report bigchou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🔉 Play and Record Sound with Python 🐍

Python 1,211 154 Updated Dec 15, 2025

Python bindings for llama.cpp

Python 9,907 1,275 Updated Aug 15, 2025

Building a inclusive, scalable, and high-performance multilingual translation model

Python 119 9 Updated Jan 10, 2026

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 736 55 Updated Jan 14, 2026

A Corpus of Southern Min Dialect for Automatic Speech Recognition

Python 7 1 Updated Aug 27, 2024

https://deep-learning-101.github.io/Speech-Processing Speech Processing (語音處理)

22 2 Updated Jan 4, 2026

playwright project sample

HTML 37 14 Updated Sep 19, 2022

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, L…

C++ 1,611 203 Updated Oct 20, 2025

Python Interface for the Popular mermaid-js Library, Simplified for Diagram Creation

Python 157 13 Updated Jan 13, 2026

This repository focuses on leveraging OpenAI's Whisper model for speech recognition in Chinese (Mandarin) and Taiwanese Hokkien languages. It includes tools and scripts for data preprocessing, mode…

Python 65 11 Updated Mar 1, 2025

This Python script to detect and decode QR codes in real-time from a live webcam feed. It is a handy tool for instant QR code scanning applications, such as inventory management and digital ticketing.

Python 6 1 Updated Aug 6, 2023

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTor…

Python 3,722 535 Updated Sep 21, 2025

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.

Python 722 88 Updated Jan 12, 2026

A TTS model capable of generating ultra-realistic dialogue in one pass.

Python 19,044 1,665 Updated Nov 19, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 28,781 3,529 Updated Dec 5, 2025

📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.

Python 5,668 549 Updated Jan 12, 2026

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

Jupyter Notebook 4,160 774 Updated Jan 12, 2026

YOLOE: Real-Time Seeing Anything [ICCV 2025]

Python 1,998 188 Updated Jun 26, 2025

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 27,559 2,753 Updated Jan 14, 2026

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。

Python 7,078 1,410 Updated Oct 25, 2024

A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8…

Python 4,045 630 Updated Dec 10, 2025

real time face swap and one-click video deepfake with only a single image

Python 78,260 11,413 Updated Dec 15, 2025

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 19,696 2,110 Updated Oct 21, 2025

A python package to build AI-powered real-time audio applications

Python 1,911 154 Updated Feb 12, 2025
Jupyter Notebook 35 2 Updated Jan 6, 2026

Application for viewing Rich Transcription Time Marked (RTTM) files in an interactive way

Python 48 5 Updated Apr 19, 2023

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

Python 9,337 793 Updated Jul 11, 2025

A nearly-live implementation of OpenAI's Whisper.

Python 3,743 512 Updated Jan 13, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 155,358 31,778 Updated Jan 18, 2026

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

1,835 236 Updated Jul 22, 2025
Next