Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Incorrect connection of data points can happen when logging out of order with implicit step valueΒ #3278

@cdalinghaus

Description

@cdalinghaus

πŸ› Bug

Summary: When metrics are logged using only epoch parameter, step value is chosen incrementally. When this happens out of order (for example: asynchronous evaluation on a batch system), displaying them in an epoch/value graph will connect the lines incorrectly. This is because step is used to determine the order the data points are connected in.

image

To reproduce

Pseudo:

run.track(float(train_loss), name='train_loss', epoch=1)
run.track(float(eval_loss), name='eval_loss', epoch=1)
run.track(float(train_loss), name='train_loss', epoch=2)
run.track(float(train_loss), name='train_loss', epoch=3)
run.track(float(train_loss), name='train_loss', epoch=4)
run.track(float(eval_loss), name='eval_loss', epoch=3) # Out of order due to scheduling 
run.track(float(eval_loss), name='eval_loss', epoch=2) # Out of order due to scheduling

In my specific setting, eval_loss is calculated by a seperate process and saved to disk. Periodically, the main process (also running the training and the aim logger picks up the value from disk and logs it)

Expected behavior

Connection of data points in the graph is dictated by whatever is selected to be on the x axis.

Environment

  • Aim Version: v3.27.0
  • Python version: latest
  • pip version: latest
  • OS: Linux

Additional context

Workaround: Always also calculate and log current step value:

run.track(float(eval_loss), name='eval_loss', epoch=epoch, step=len(train_dataloader) * epoch)

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtype / bugIssue type: something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions