Thanks to visit codestin.com
Credit goes to github.com

Skip to content

slp-rl/PAST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PAST: Phonetic-Acoustic Speech Tokenizer

Introduction

This repository contains the official implementation of PAST: Phonetic-Acoustic Speech Tokenizer.
PAST is a unified framework that captures both phonetic and acoustic representations, outperforming previous hybrid speech tokenizers on phonetic accuracy, reconstruction fidelity, and speech language modeling performance.

Unlike prior approaches that rely on self-supervised models and vocoders, PAST uses direct phonetic supervision and a joint training scheme. We also introduce a streamable, causal variant for real-time applications.


Figure: Overview of the PAST architecture.

Quick Links

Release

  • [2025/07] 🔥 Initial release of code, pretrained weights, and demo page

Samples

Audio samples are available on our project demo page.

Installation

You can either install PAST directly via pip or clone the repository for development.

🔧 Option 1: Install via pip

Create a fresh environment and install directly from GitHub:

conda create -n past_env python=3.10 -y
conda activate past_env
pip install git+https://github.com/slp-rl/PAST.git

🛠 Option 2: Clone and install locally

Clone the repository and install dependencies manually:

git clone https://github.com/slp-rl/PAST.git
cd PAST
conda create -n past_env python=3.10 -y
conda activate past_env
pip install -r requirements.txt

Model List

Model Variant Description
PAST Full PAST model trained on LibriSpeech + TIMIT
PAST-streamable Streamable Causal variant with 20ms look-ahead

Usage

see demo notebook for a full running example!

# ---------------
# load PAST model
# ---------------
import torch
from past.models.past_model import PastModel
model = PastModel.from_pretrained("PAST")  # one of ['PAST', 'PAST_streamable']

# ----------------------------------------------------------------------
# Run on audio: PAST expects a batched input format [Batch, Channels, T]
# ----------------------------------------------------------------------
import torchaudio

def read_one_wav(path, target_sr):
    wav, sr = torchaudio.load(path)
    if sr != target_sr:
        wav = torchaudio.transforms.Resample(sr, target_sr)(wav)
    if wav.shape[0] == 2:
        wav = wav[:1]
    return wav.unsqueeze(0)

wav = read_one_wav("assets/1089-134686-0004.flac", model.sample_rate).to(model.device)

with torch.no_grad():
    codes, scale = model.encode(wav)
    reconstructed = model.decode(codes, scale)

Full Usage Guides

Learn how to prepare your dataset for training PAST, including alignment extraction using Wav2Vec2 and manifest generation.

Step-by-step instructions for training PAST from scratch, with explanations of key configuration parameters and recommended practices.

Evaluate the model’s performance in terms of speech reconstruction quality using SI-SNR and PESQ. The guide also outlines the other evaluation metrics (PNMI, ABX, WER) and explains how we used them in our experiments.

Citation

If you use PAST in your work, please cite:

@article{har2025past,
    title={Past: Phonetic-acoustic speech tokenizer},
    author={Har-Tuv, Nadav and Tal, Or and Adi, Yossi},
    journal={arXiv preprint arXiv:2505.14470},
    year={2025}
  } 

License

This project is released under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published