PASQA

This repository contains materials developed by LY Corporation and is temporarily open-sourced for the purpose of a paper.

Accepted to INTERSPEECH 2026

Temporary Release: This repository is temporarily available as open-source. Therefore this repository may be turned into read-only or private anytime.
Attribution: All code and materials in this repository are owned by LY Corporation.

Project Overview

PASQA (Pitch-Accent-focused Speech Quality Assessment) is a mean opinion score (MOS) prediction model that explicitly targets pitch-accent correctness. It is trained on a controlled Japanese accent-error dataset, constructed by changing accent patterns using an accent-controllable text-to-speech system, with a pseudo accent-quality score computed from the accent-error rate. PASQA builds on self-supervised representations and employs mora-conditioned fusion, ranking loss, an auxiliary accent-error localization task, and speaker-invariant training.

Installation and Usage

1. Install

uv sync

2. Download the pretrained model

Download the checkpoint .pkl and its config.yml from Hugging Face and place them together anywhere you like. The layout below (e.g. a pretrained/ directory) is only an example:

pasqa/
├── pretrained/
│   ├── checkpoint-100000steps.pkl   # model weights
│   └── config.yml            # auto-discovered from the checkpoint's directory
└── src/pasqa/vocab.txt       # bundled mora vocab (fallback)

The mora vocab is resolved from the config's mora_vocab_path; if that path cannot be found, the bundled src/pasqa/vocab.txt is used automatically.

3. Run inference

from pasqa import PasqaPredictor

predictor = PasqaPredictor(
    checkpoint="pretrained/checkpoint-best.pkl",
    # config is auto-discovered from the checkpoint's directory (config.yml)
    # device defaults to 'cuda' if available, else 'cpu'
)

result = predictor.predict(
    wav_path="audio.wav",
    mora=["ア", "シ", "タ", "ノ", "テ", "ン", "キ", "ハ", "ハ", "レ", "デ", "ス"],  # katakana mora list is REQUIRED
)

print(result["mos"])  # predicted MOS, ~1–5

Acknowledgements

The model implementation is based on the SHEET toolkit.

Contributions

As this project is temporarily open-sourced, we are not accepting contributions. For feedback or inquiries, please open an issue in this repository.

License

This code is dedicated to the public domain under CC0 1.0. You may copy, modify, and distribute it without restriction, and the authors make no warranties or guarantees regarding its use.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/pasqa		src/pasqa
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PASQA

Project Overview

Installation and Usage

1. Install

2. Download the pretrained model

3. Run inference

Acknowledgements

Contributions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PASQA

Project Overview

Installation and Usage

1. Install

2. Download the pretrained model

3. Run inference

Acknowledgements

Contributions

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages