Multiple runs created for a single distributed training task with AIM

## ❓Question

When using AIM for a distributed training task with multiple GPUs (e.g., 8 GPUs), I noticed that each GPU generates a separate run with its own hyperparameters and metrics. As a result, for a single distributed training task with 8 GPUs, a total of 8 runs are created. 

However, my expectation is to have only one run for the entire distributed training task, regardless of the number of GPUs used. Is this behavior expected, or is there a way to consolidate the runs into a single run for the entire task? 

Having multiple runs for a single task makes it difficult to track and analyze the overall performance and metrics. It would be more convenient and intuitive to have a single run that aggregates the data from all GPUs involved in the distributed training process. 

Please let me know if this behavior is intended or if there is a configuration option or workaround to achieve a single run for distributed training tasks with AIM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple runs created for a single distributed training task with AIM #3148

❓Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multiple runs created for a single distributed training task with AIM #3148

Description

❓Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions