Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Generates high-quality fine-tuning pairs for large language models (LLMs) from unstructured documents.

License

dross20/tuatara

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tuatara logo

Static Badge GitHub license Ruff


"Artificial intelligence is only as good as the data it learns from."
- Unknown

🦎 What is Tuatara?

Tuatara is a library for generating fine-tuning pairs for large language model (LLM) post training.

🤔 Why Tuatara?

Fine-tuning large language models requires high-quality training data pairs that are well grounded in their source documents. Creating these pairs manually is laborious and error-prone, and existing tools often lack flexibility or fail to scale across different document types and domains. Tuatara addresses these challenges directly.

📦 Installation

Run the following command to install Tuatara:

pip install git+https://github.com/dross20/tuatara

🚀 Quickstart

The following example demonstrates how to use Tuatara's preconfigured pipeline for creating fine tuning pairs from multiple documents. By default, default_pipeline will use the OpenAI API for LLM inference and search for your OpenAI API key in the environment variables.

from tuatara import default_pipeline

documents = [
  "./document1.pdf",
  "./document2.pdf",
  "./document3.txt"
]

pipeline = default_pipeline(model="gpt-4o")
pairs, history = pipeline(documents)

📜 License

This project is licensed under the MIT license.

About

Generates high-quality fine-tuning pairs for large language models (LLMs) from unstructured documents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published