Inaccurate time estimation results for fine-tuning use-case #128

zafercavdar · 2022-10-06T14:22:52Z

This PR introduces time estimation functionality for fine-tuning tasks. We observed in our experiments that estimated values are pretty inaccurate and have a few questions and suggestions:

Question 1: Is there any public information about where constants like 0.0515 (line 601) come from?
My data frame which was used for fine-tuning curie model for 2 epochs contains 8236 rows. Our aim was to train an open-ended generator, that's why prompt column is completely empty. However, running memory_usage on this df gives the same values for prompt and completion columns. Please note that completion column contains pretty long text values.

If I use sys module to get the size of df on system, I get a very different result.

If I add deep=True parameter to pandas' memory_usage call, the returned value becomes very similar to sys output.

Question 2: Based on the previous trials, is there any reason why the estimator doesn't use deep=True flag to get memory consumed in the system?

Question 3: Does this estimator have any number of epochs assumption?
Time estimator returns 1.92 hours (approximately 115 minutes) for my dataset. When I started training on the same df for 2 epochs, it took 17 minutes in total. ~9 minutes per epoch. It doesn't take this parameter into account because it's not available until the fine-tuning process call is made.

Once your model starts training, it'll approximately take 1.93 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you

....
[2022-10-06 15:56:26] Fine-tune enqueued. Queue number: 0
[2022-10-06 15:56:29] Fine-tune started
[2022-10-06 16:05:33] Completed epoch 1/2
[2022-10-06 16:13:42] Completed epoch 2/2

Suggestions:

More documentation about constant values like 0.0515
Adding deep=True flag to memory_usage call and updating constants accordingly
Adding information about epoch count assumption to the log message, like Once your model starts training, it'll approximately take 1.93 hours to train a curie model for x epochs based on historical statistics, and less ...

The text was updated successfully, but these errors were encountered:

rattrayalex · 2023-12-30T23:56:50Z

I believe these helpers have been removed. If the problem remains in v1 of this library, please let me know and we can reopen.

zafercavdar changed the title ~~Very inaccurate time estimation results for fine-tuning use-case~~ Inaccurate time estimation results for fine-tuning use-case Oct 6, 2022

rattrayalex closed this as completed Dec 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inaccurate time estimation results for fine-tuning use-case #128

Inaccurate time estimation results for fine-tuning use-case #128

zafercavdar commented Oct 6, 2022 •

edited

Loading

rattrayalex commented Dec 30, 2023

Inaccurate time estimation results for fine-tuning use-case #128

Inaccurate time estimation results for fine-tuning use-case #128

Comments

zafercavdar commented Oct 6, 2022 • edited Loading

Suggestions:

rattrayalex commented Dec 30, 2023

zafercavdar commented Oct 6, 2022 •

edited

Loading