Lots of CLI changes #22

rachellim · 2021-06-29T20:32:10Z

No description provided.

#33) The check is done based on filename, file purpose and file size

It applies the following validations: - prints the number of examples, and warns if it's lower than 100 - ensures prompt and completion columns are present - optionally removes any additional columns - ensures all completions are non-empty - infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation) - optionally removes duplicate rows - infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation - optionally prepends a space to each completion, to make tokenization better - optionally splits into training and validation set for the classification use case - optionally ensures there's an ending string for all completions - optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command.

emorikawa

Looks good to me.

emorikawa · 2021-06-29T21:49:30Z

openai/api_resources/completion.py

@@ -19,7 +19,7 @@ def create(cls, *args, **kwargs):
        of valid parameters.
        """
        start = time.time()
-        timeout = kwargs.get("timeout", None)
+        timeout = kwargs.pop("timeout", None)


@rachellim What's the reason for this change? This is preventing the timeout parameter to be passed on to super().create(...), meaning that API users have no way to specify a timeout.

Hi @feroldi! In trying to avoid a backwards incompatible change, we added a new param called request_timeout so that users could set a timeout that didn't interfere with the existing timeout functionality. It's documented here: https://github.com/openai/openai-python#params

Does that help?

Yes! That makes sense. Thanks.

* Add CLI option to download files (openai#34) * Option to check if file has been uploaded in the past before uploading (openai#33) The check is done based on filename, file purpose and file size * Add fine-tuning hparams directly into the fine-tunes CLI (openai#35) * update fine_tunes cli use_packing argument (openai#38) * A file verification and remediation tool. It applies the following validations: - prints the number of examples, and warns if it's lower than 100 - ensures prompt and completion columns are present - optionally removes any additional columns - ensures all completions are non-empty - infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation) - optionally removes duplicate rows - infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation - optionally prepends a space to each completion, to make tokenization better - optionally splits into training and validation set for the classification use case - optionally ensures there's an ending string for all completions - optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command. * Completion: remove from kwargs before passing to EngineAPI (openai#37) * Version bump before pushing to external Co-authored-by: Todor Markov <[email protected]> Co-authored-by: Boris Power <[email protected]> Co-authored-by: Dave Cummings <[email protected]>

todor-markov and others added 7 commits June 29, 2021 13:18

Add CLI option to download files (#34)

3b2f698

Option to check if file has been uploaded in the past before uploading (

74de0ca

#33) The check is done based on filename, file purpose and file size

Add fine-tuning hparams directly into the fine-tunes CLI (#35)

71c347d

update fine_tunes cli use_packing argument (#38)

7384a76

Completion: remove from kwargs before passing to EngineAPI (#37)

868188a

Version bump before pushing to external

32c2010

rachellim requested a review from emorikawa June 29, 2021 21:48

emorikawa approved these changes Jun 29, 2021

View reviewed changes

rachellim merged commit 7ddcba1 into main Jun 29, 2021

rachellim deleted the rachel/lots-of-changes branch June 29, 2021 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Lots of CLI changes #22

Lots of CLI changes #22

Uh oh!

rachellim commented Jun 29, 2021

Uh oh!

emorikawa left a comment

Uh oh!

emorikawa Jun 29, 2021

Uh oh!

feroldi Dec 7, 2022

Uh oh!

hallacy Dec 8, 2022

Uh oh!

feroldi Jan 10, 2023

Uh oh!

Uh oh!

Lots of CLI changes #22

Lots of CLI changes #22

Uh oh!

Conversation

rachellim commented Jun 29, 2021

Uh oh!

emorikawa left a comment

Choose a reason for hiding this comment

Uh oh!

emorikawa Jun 29, 2021

Choose a reason for hiding this comment

Uh oh!

feroldi Dec 7, 2022

Choose a reason for hiding this comment

Uh oh!

hallacy Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

feroldi Jan 10, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!