Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Inaccurate time estimation results for fine-tuning use-case #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks
zafercavdar opened this issue Oct 6, 2022 · 1 comment
Closed
3 tasks

Inaccurate time estimation results for fine-tuning use-case #128

zafercavdar opened this issue Oct 6, 2022 · 1 comment

Comments

@zafercavdar
Copy link
Contributor

zafercavdar commented Oct 6, 2022

This PR introduces time estimation functionality for fine-tuning tasks. We observed in our experiments that estimated values are pretty inaccurate and have a few questions and suggestions:

Question 1: Is there any public information about where constants like 0.0515 (line 601) come from?
My data frame which was used for fine-tuning curie model for 2 epochs contains 8236 rows. Our aim was to train an open-ended generator, that's why prompt column is completely empty. However, running memory_usage on this df gives the same values for prompt and completion columns. Please note that completion column contains pretty long text values.

Screen Shot 2022-10-06 at 17 12 48

If I use sys module to get the size of df on system, I get a very different result.

Screen Shot 2022-10-06 at 17 14 38

If I add deep=True parameter to pandas' memory_usage call, the returned value becomes very similar to sys output.

Screen Shot 2022-10-06 at 17 16 07

Question 2: Based on the previous trials, is there any reason why the estimator doesn't use deep=True flag to get memory consumed in the system?

Question 3: Does this estimator have any number of epochs assumption?
Time estimator returns 1.92 hours (approximately 115 minutes) for my dataset. When I started training on the same df for 2 epochs, it took 17 minutes in total. ~9 minutes per epoch. It doesn't take this parameter into account because it's not available until the fine-tuning process call is made.

Once your model starts training, it'll approximately take 1.93 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you

....
[2022-10-06 15:56:26] Fine-tune enqueued. Queue number: 0
[2022-10-06 15:56:29] Fine-tune started
[2022-10-06 16:05:33] Completed epoch 1/2
[2022-10-06 16:13:42] Completed epoch 2/2

Suggestions:

  • More documentation about constant values like 0.0515
  • Adding deep=True flag to memory_usage call and updating constants accordingly
  • Adding information about epoch count assumption to the log message, like Once your model starts training, it'll approximately take 1.93 hours to train a curie model for x epochs based on historical statistics, and less ...
@zafercavdar zafercavdar changed the title Very inaccurate time estimation results for fine-tuning use-case Inaccurate time estimation results for fine-tuning use-case Oct 6, 2022
@rattrayalex
Copy link
Collaborator

I believe these helpers have been removed. If the problem remains in v1 of this library, please let me know and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants