Thanks to visit codestin.com
Credit goes to github.com

Skip to content

lycorp-jp/PASQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PASQA

This repository contains materials developed by LY Corporation and is temporarily open-sourced for the purpose of a paper.

Accepted to INTERSPEECH 2026

  • Temporary Release: This repository is temporarily available as open-source. Therefore this repository may be turned into read-only or private anytime.
  • Attribution: All code and materials in this repository are owned by LY Corporation.

Project Overview

PASQA (Pitch-Accent-focused Speech Quality Assessment) is a mean opinion score (MOS) prediction model that explicitly targets pitch-accent correctness. It is trained on a controlled Japanese accent-error dataset, constructed by changing accent patterns using an accent-controllable text-to-speech system, with a pseudo accent-quality score computed from the accent-error rate. PASQA builds on self-supervised representations and employs mora-conditioned fusion, ranking loss, an auxiliary accent-error localization task, and speaker-invariant training.

Installation and Usage

1. Install

uv sync

2. Download the pretrained model

Hugging Face

Download the checkpoint .pkl and its config.yml from Hugging Face and place them together anywhere you like. The layout below (e.g. a pretrained/ directory) is only an example:

pasqa/
├── pretrained/
│   ├── checkpoint-100000steps.pkl   # model weights
│   └── config.yml            # auto-discovered from the checkpoint's directory
└── src/pasqa/vocab.txt       # bundled mora vocab (fallback)

The mora vocab is resolved from the config's mora_vocab_path; if that path cannot be found, the bundled src/pasqa/vocab.txt is used automatically.

3. Run inference

from pasqa import PasqaPredictor

predictor = PasqaPredictor(
    checkpoint="pretrained/checkpoint-best.pkl",
    # config is auto-discovered from the checkpoint's directory (config.yml)
    # device defaults to 'cuda' if available, else 'cpu'
)

result = predictor.predict(
    wav_path="audio.wav",
    mora=["ア", "シ", "タ", "ノ", "テ", "ン", "キ", "ハ", "ハ", "レ", "デ", "ス"],  # katakana mora list is REQUIRED
)

print(result["mos"])  # predicted MOS, ~1–5

Acknowledgements

The model implementation is based on the SHEET toolkit.

Contributions

As this project is temporarily open-sourced, we are not accepting contributions. For feedback or inquiries, please open an issue in this repository.

License

This code is dedicated to the public domain under CC0 1.0. You may copy, modify, and distribute it without restriction, and the authors make no warranties or guarantees regarding its use.

About

PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages