* Add CLI option to download files (#34)
* Option to check if file has been uploaded in the past before uploading (#33)
The check is done based on filename, file purpose and file size
* Add fine-tuning hparams directly into the fine-tunes CLI (#35)
* update fine_tunes cli use_packing argument (#38)
* A file verification and remediation tool.
It applies the following validations:
- prints the number of examples, and warns if it's lower than 100
- ensures prompt and completion columns are present
- optionally removes any additional columns
- ensures all completions are non-empty
- infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation)
- optionally removes duplicate rows
- infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation
- optionally prepends a space to each completion, to make tokenization better
- optionally splits into training and validation set for the classification use case
- optionally ensures there's an ending string for all completions
- optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case
It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command.
* Completion: remove from kwargs before passing to EngineAPI (#37)
* Version bump before pushing to external
Co-authored-by: Todor Markov <[email protected]>
Co-authored-by: Boris Power <[email protected]>
Co-authored-by: Dave Cummings <[email protected]>