Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lots of CLI changes #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 29, 2021
Merged

Lots of CLI changes #22

merged 7 commits into from
Jun 29, 2021

Conversation

rachellim
Copy link
Collaborator

No description provided.

todor-markov and others added 7 commits June 29, 2021 13:18
#33)

The check is done based on filename, file purpose and file size
It applies the following validations:
- prints the number of examples, and warns if it's lower than 100
- ensures prompt and completion columns are present
- optionally removes any additional columns
- ensures all completions are non-empty
- infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation)
- optionally removes duplicate rows
- infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation
- optionally prepends a space to each completion, to make tokenization better
- optionally splits into training and validation set for the classification use case
- optionally ensures there's an ending string for all completions
- optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case

It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command.
@rachellim rachellim requested a review from emorikawa June 29, 2021 21:48
Copy link
Contributor

@emorikawa emorikawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@@ -19,7 +19,7 @@ def create(cls, *args, **kwargs):
of valid parameters.
"""
start = time.time()
timeout = kwargs.get("timeout", None)
timeout = kwargs.pop("timeout", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rachellim What's the reason for this change? This is preventing the timeout parameter to be passed on to super().create(...), meaning that API users have no way to specify a timeout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @feroldi! In trying to avoid a backwards incompatible change, we added a new param called request_timeout so that users could set a timeout that didn't interfere with the existing timeout functionality. It's documented here: https://github.com/openai/openai-python#params

Does that help?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That makes sense. Thanks.

@rachellim rachellim merged commit 7ddcba1 into main Jun 29, 2021
@rachellim rachellim deleted the rachel/lots-of-changes branch June 29, 2021 21:55
cgayapr pushed a commit to cgayapr/openai-python that referenced this pull request Dec 14, 2024
* Add CLI option to download files (openai#34)

* Option to check if file has been uploaded in the past before uploading (openai#33)

The check is done based on filename, file purpose and file size

* Add fine-tuning hparams directly into the fine-tunes CLI (openai#35)

* update fine_tunes cli use_packing argument (openai#38)

* A file verification and remediation tool.

It applies the following validations:
- prints the number of examples, and warns if it's lower than 100
- ensures prompt and completion columns are present
- optionally removes any additional columns
- ensures all completions are non-empty
- infers which type of fine-tuning the data is most likely in (classification, conditional generation and open-ended generation)
- optionally removes duplicate rows
- infers the existence of a common suffix, and if there is none, suggests one for classification and conditional generation
- optionally prepends a space to each completion, to make tokenization better
- optionally splits into training and validation set for the classification use case
- optionally ensures there's an ending string for all completions
- optionally lowercases completions or prompts if more than a 1/3 of alphanumeric characters are upper case

It interactively asks the user to accept or reject recommendations. If the user is happy, then it saves the modified output file as a jsonl, which is ready for being used in fine-tuning with the printed command.

* Completion: remove  from kwargs before passing to EngineAPI (openai#37)

* Version bump before pushing to external

Co-authored-by: Todor Markov <[email protected]>
Co-authored-by: Boris Power <[email protected]>
Co-authored-by: Dave Cummings <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants