"Artificial intelligence is only as good as the data it learns from."
- Unknown
Tuatara is a library for generating fine-tuning pairs for large language model (LLM) post training.
Fine-tuning large language models requires high-quality training data pairs that are well grounded in their source documents. Creating these pairs manually is laborious and error-prone, and existing tools often lack flexibility or fail to scale across different document types and domains. Tuatara addresses these challenges directly.
Run the following command to install Tuatara:
pip install git+https://github.com/dross20/tuataraThe following example demonstrates how to use Tuatara's preconfigured pipeline for creating fine tuning pairs from multiple documents. By default, default_pipeline will use the OpenAI API for LLM inference and search for your OpenAI API key in the environment variables.
from tuatara import default_pipeline
documents = [
"./document1.pdf",
"./document2.pdf",
"./document3.txt"
]
pipeline = default_pipeline(model="gpt-4o")
pairs, history = pipeline(documents)This project is licensed under the MIT license.