Description
This PR introduces time estimation functionality for fine-tuning tasks. We observed in our experiments that estimated values are pretty inaccurate and have a few questions and suggestions:
Question 1: Is there any public information about where constants like 0.0515
(line 601) come from?
My data frame which was used for fine-tuning curie
model for 2 epochs contains 8236 rows. Our aim was to train an open-ended generator, that's why prompt
column is completely empty. However, running memory_usage
on this df gives the same values for prompt
and completion
columns. Please note that completion
column contains pretty long text values.
If I use sys
module to get the size of df on system, I get a very different result.
If I add deep=True
parameter to pandas
' memory_usage
call, the returned value becomes very similar to sys
output.
Question 2: Based on the previous trials, is there any reason why the estimator doesn't use deep=True
flag to get memory consumed in the system?
Question 3: Does this estimator have any number of epochs
assumption?
Time estimator returns 1.92 hours (approximately 115 minutes) for my dataset. When I started training on the same df for 2 epochs, it took 17 minutes in total. ~9 minutes per epoch. It doesn't take this parameter into account because it's not available until the fine-tuning process call is made.
Once your model starts training, it'll approximately take 1.93 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you
....
[2022-10-06 15:56:26] Fine-tune enqueued. Queue number: 0
[2022-10-06 15:56:29] Fine-tune started
[2022-10-06 16:05:33] Completed epoch 1/2
[2022-10-06 16:13:42] Completed epoch 2/2
Suggestions:
- More documentation about constant values like
0.0515
- Adding
deep=True
flag tomemory_usage
call and updating constants accordingly - Adding information about
epoch count
assumption to the log message, likeOnce your model starts training, it'll approximately take 1.93 hours to train a
curiemodel for x epochs based on historical statistics, and less ...