You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR introduces time estimation functionality for fine-tuning tasks. We observed in our experiments that estimated values are pretty inaccurate and have a few questions and suggestions:
Question 1: Is there any public information about where constants like 0.0515 (line 601) come from?
My data frame which was used for fine-tuning curie model for 2 epochs contains 8236 rows. Our aim was to train an open-ended generator, that's why prompt column is completely empty. However, running memory_usage on this df gives the same values for prompt and completion columns. Please note that completion column contains pretty long text values.
If I use sys module to get the size of df on system, I get a very different result.
If I add deep=True parameter to pandas' memory_usage call, the returned value becomes very similar to sys output.
Question 2: Based on the previous trials, is there any reason why the estimator doesn't use deep=True flag to get memory consumed in the system?
Question 3: Does this estimator have any number of epochs assumption?
Time estimator returns 1.92 hours (approximately 115 minutes) for my dataset. When I started training on the same df for 2 epochs, it took 17 minutes in total. ~9 minutes per epoch. It doesn't take this parameter into account because it's not available until the fine-tuning process call is made.
Once your model starts training, it'll approximately take 1.93 hours to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you
....
[2022-10-06 15:56:26] Fine-tune enqueued. Queue number: 0
[2022-10-06 15:56:29] Fine-tune started
[2022-10-06 16:05:33] Completed epoch 1/2
[2022-10-06 16:13:42] Completed epoch 2/2
Suggestions:
More documentation about constant values like 0.0515
Adding deep=True flag to memory_usage call and updating constants accordingly
Adding information about epoch count assumption to the log message, like Once your model starts training, it'll approximately take 1.93 hours to train a curie model for x epochs based on historical statistics, and less ...
The text was updated successfully, but these errors were encountered:
zafercavdar
changed the title
Very inaccurate time estimation results for fine-tuning use-case
Inaccurate time estimation results for fine-tuning use-case
Oct 6, 2022
This PR introduces time estimation functionality for fine-tuning tasks. We observed in our experiments that estimated values are pretty inaccurate and have a few questions and suggestions:
Question 1: Is there any public information about where constants like
0.0515
(line 601) come from?My data frame which was used for fine-tuning
curie
model for 2 epochs contains 8236 rows. Our aim was to train an open-ended generator, that's whyprompt
column is completely empty. However, runningmemory_usage
on this df gives the same values forprompt
andcompletion
columns. Please note thatcompletion
column contains pretty long text values.If I use
sys
module to get the size of df on system, I get a very different result.If I add
deep=True
parameter topandas
'memory_usage
call, the returned value becomes very similar tosys
output.Question 2: Based on the previous trials, is there any reason why the estimator doesn't use
deep=True
flag to get memory consumed in the system?Question 3: Does this estimator have any
number of epochs
assumption?Time estimator returns 1.92 hours (approximately 115 minutes) for my dataset. When I started training on the same df for 2 epochs, it took 17 minutes in total. ~9 minutes per epoch. It doesn't take this parameter into account because it's not available until the fine-tuning process call is made.
Suggestions:
0.0515
deep=True
flag tomemory_usage
call and updating constants accordinglyepoch count
assumption to the log message, likeOnce your model starts training, it'll approximately take 1.93 hours to train a
curiemodel for x epochs based on historical statistics, and less ...
The text was updated successfully, but these errors were encountered: