-
Notifications
You must be signed in to change notification settings - Fork 363
Description
π Bug
I have been using AIM to track item detection experiments. We have a back-end running in one of our remote servers we use to track our training and evaluation data. The data consists either of float or image data (mostly numpy.NDARRAY[numpy.uint8]. I have observed massive performance differences between tracking data to a remote AIM server or to a local AIM server running on my laptop.
For instance, tracking a json file with 3000 lines (see attachment in the To reproduce section) takes more than 15minutes to push to the remote server while it takes less than 10 seconds to do the exact same job locally(!).
I have tried to debug this by pushing batches of data instead of doing one call per metric, but nothing seems to make a difference. To add more unexpected information to the picture, tracking 95 images (each app 4MB) to the exact same server took only one minute. I think this means that the delay is not related with the size of the data being tracked (the images are almost 400Mbs while the raw json data is 4.6Mb) π€·ββοΈ
I would really appreciate if someone could cast some light on this, if this difference in performance is expected or if there are any optimizations in terms of tracking/hardware... we could use to speed it up, because how it works not it is really not usable.
To reproduce
- Start remote AIM server
- Load metrics.json and track each metric
- Code snippet used to recreated:
import os
import json
import numpy
from aim import Run
repo="aim://my_aim_server"
path_to_metrics_json="abs_path_metrics.json"
# Start run
logger = Run(experiment="back-end-test", repo=repo)
# Format metrics to proper json
new_metrics_path = "/tmp/new_metrics.json"
if os.path.exists(new_metrics_path):
os.remove(new_metrics_path)
os.system(f"cat {path_to_metrics_json} | jq -s '.[0:]' >> {new_metrics_path}")
with open(new_metrics_path) as json_data:
metrics_data = json.load(json_data)
# Log train metrics
for metrics_dict in metrics_data:
for k, v in metrics_dict.items():
if not v:
v = numpy.nan
logger.track(float(v), k, step=metrics_dict["iteration"])Expected behavior
Pushing the metrics should not take more than 15minutes
Environment
- Aim Version 3.20.1
- Python version 3.9.18
- OS Ubuntu 22 LTS