YAST - Yet Another SPLADE or Sparse Trainer 🚀

Welcome to YAST! This open-source project provides a powerful and flexible SPLADE (Sparse Lexical and Expansion) trainer. Built to integrate seamlessly with Huggingface's Trainer API, YAST allows you to leverage cutting-edge sparse retrieval techniques based on various SPLADE-related research papers. Our goal is to offer an accessible tool for training these models. YAST is licensed under the permissive MIT License.

⚠️ Important Notice

Please note that YAST is currently an experimental project. This means you might encounter breaking changes introduced from time to time. To ensure a stable experience, we highly recommend forking this repository and working with a specific revision (commit hash).

Development Setup

This project uses uv for dependency management and requires Python 3.11.

Prerequisites

Python 3.11+
uv package manager

Quick Start

# Clone the repository
git clone https://github.com/hotchpotch/yast.git
cd yast

# Create virtual environment and install dependencies
uv venv --python 3.11 .venv
uv sync --extra dev

# Activate virtual environment (optional - you can use uv run instead)
source .venv/bin/activate

# Run training example
uv run python -m yast.run examples/japanese-splade/toy.yaml

Optional: Flash Attention 2 for Performance

For improved training speed, install Flash Attention 2:

uv pip install --no-deps flash-attn --no-build-isolation
uv pip install einops

Note: Requires a compatible CUDA GPU and may take time to compile.

Training a Japanese SPLADE Model

For details on training a Japanese SPLADE model, please see the Japanese SPLADE example. This document is written in Japanese (日本語で書かれています). If you don't read Japanese, online translation tools can be helpful for understanding the content.

💡 Related Work

Another project, YASEM (Yet Another Splade | Sparse Embedder), offers a more user-friendly implementation for working with SPLADE models.

🙏 Acknowledgments

We thank the researchers behind the original SPLADE papers for their outstanding contributions to this field.

References

License

This project is licensed under the MIT License. See the LICENSE file for full license details.
Copyright (c) 2024 Yuichi Tateno (@hotchpotch)

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
examples		examples
tests		tests
utils		utils
yast		yast
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

YAST - Yet Another SPLADE or Sparse Trainer 🚀

⚠️ Important Notice

Development Setup

Prerequisites

Quick Start

Optional: Flash Attention 2 for Performance

Training a Japanese SPLADE Model

Related Blog Posts (Content in Japanese)

💡 Related Work

🙏 Acknowledgments

References

License

About

Uh oh!

Releases

Packages

Contributors 3

Languages

License

hotchpotch/yast

Folders and files

Latest commit

History

Repository files navigation

YAST - Yet Another SPLADE or Sparse Trainer 🚀

⚠️ Important Notice

Development Setup

Prerequisites

Quick Start

Optional: Flash Attention 2 for Performance

Training a Japanese SPLADE Model

Related Blog Posts (Content in Japanese)

💡 Related Work

🙏 Acknowledgments

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages