-
Institute of Science Tokyo (Science Tokyo, formerly Tokyo Tech)
- Tokyo, Japan
- https://scholar.google.com/citations?hl=en&user=zHAhs0IAAAAJ
Highlights
- Pro
Stars
Official PyTorch implementation of 'Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain'
DiaRemot2-ON: CPU-only audio intelligence pipeline (Faster-Whisper, ONNX, diarization, paralinguistics)
Europeanized CosyVoice2 for French & German
[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
PESQ (Perceptual Evaluation of Speech Quality) Wrapper for Python Users (narrow band and wide band)
A toolkit for benchmarking on a wide variety of audio deepfake datasets.
Text to speech with Hebrew G2P and TTS models based on Piper/Gemma3
T-one is a high-performance streaming ASR pipeline for Russian, specialized for the telephony domain.
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
Official code for paper "Learning to Use Tools via Cooperative and Interactive Agents"
Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"
Train transformer language models with reinforcement learning.
Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""
Source code and adittional results of INTERSPEECH 2025 paper 'A Dataset for Automatic Assessment of TTS Quality in Spanish'
Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models
Align Anything: Training All-modality Model with Feedback
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Speech-to-text server framework with next-gen Kaldi