Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tracking simple metrics in remote server is really slowΒ #3191

@diogo-sr

Description

@diogo-sr

πŸ› Bug

I have been using AIM to track item detection experiments. We have a back-end running in one of our remote servers we use to track our training and evaluation data. The data consists either of float or image data (mostly numpy.NDARRAY[numpy.uint8]. I have observed massive performance differences between tracking data to a remote AIM server or to a local AIM server running on my laptop.
For instance, tracking a json file with 3000 lines (see attachment in the To reproduce section) takes more than 15minutes to push to the remote server while it takes less than 10 seconds to do the exact same job locally(!).

I have tried to debug this by pushing batches of data instead of doing one call per metric, but nothing seems to make a difference. To add more unexpected information to the picture, tracking 95 images (each app 4MB) to the exact same server took only one minute. I think this means that the delay is not related with the size of the data being tracked (the images are almost 400Mbs while the raw json data is 4.6Mb) πŸ€·β€β™‚οΈ

I would really appreciate if someone could cast some light on this, if this difference in performance is expected or if there are any optimizations in terms of tracking/hardware... we could use to speed it up, because how it works not it is really not usable.

To reproduce

  • Start remote AIM server
  • Load metrics.json and track each metric
  • Code snippet used to recreated:
import os
import json
import numpy
from aim import Run

repo="aim://my_aim_server"
path_to_metrics_json="abs_path_metrics.json"

# Start run
logger = Run(experiment="back-end-test", repo=repo)

# Format metrics to proper json
new_metrics_path = "/tmp/new_metrics.json"
if os.path.exists(new_metrics_path):
    os.remove(new_metrics_path)
os.system(f"cat {path_to_metrics_json} | jq -s '.[0:]' >> {new_metrics_path}")

with open(new_metrics_path) as json_data:
    metrics_data = json.load(json_data)

# Log train metrics
for metrics_dict in metrics_data:
    for k, v in metrics_dict.items():
        if not v:
            v = numpy.nan

        logger.track(float(v), k, step=metrics_dict["iteration"])

Expected behavior

Pushing the metrics should not take more than 15minutes

Environment

  • Aim Version 3.20.1
  • Python version 3.9.18
  • OS Ubuntu 22 LTS

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededtype / bugIssue type: something isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions