About • Features • Demos • Examples • Quick Start • Documentation • Roadmap • Slack Community • Twitter
| Track and version ML runs | Visualize runs via beautiful UI | Query runs metadata via SDK | 
|---|---|---|
Aim is an open-source, self-hosted ML experiment tracking tool. It's good at tracking lots (1000s) of training runs and it allows you to compare them with a performant and beautiful UI.
You can use not only the great Aim UI but also its SDK to query your runs' metadata programmatically. That's especially useful for automations and additional analysis on a Jupyter Notebook.
Aim's mission is to democratize AI dev tools.
- Compare, group and aggregate 100s of metrics thanks to effective visualizations.
- Analyze, learn correlations and patterns between hparams and metrics.
- Easy pythonic search to query the runs you want to explore.
- Hyperparameters, metrics, images, distributions, audio, text - all available at hand on an intuitive UI to understand the performance of your model.
- Easily track plots built via your favourite visualisation tools, like plotly and matplotlib.
- Analyze system resource usage to effectively utilize computational resources.
- Centralized dashboard to hollistically view all your runs, their hparams and results.
- Use SDK to query/access all your runs and tracked metadata.
- You own your data - Aim is open source and self hosted.
| Machine translation | lightweight-GAN | 
|---|---|
|  |  | 
| Training logs of a neural translation model(from WMT'19 competition). | Tranining logs of 'lightweight' GAN, proposed in ICLR 2021. | 
| FastSpeech 2 | Simple MNIST | 
|---|---|
|  |  | 
| Training logs of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech". | Simple MNIST training logs. | 
Follow the steps below to get started with Aim.
1. Install Aim on your training environment
pip3 install aim2. Integrate Aim with your code
from aim import Run, Image, Distribution
  
# Initialize a new run
run = Run()
# Log run parameters
run["hparams"] = {
    "learning_rate": 0.001,
    "batch_size": 32,
}
# Log artefacts
for step in range(1000):
    # Log metrics
    run.track(loss_val, name='loss', step=step, context={ "subset": "train" })
    run.track(accuracy_val, name='acc', step=step, context={ "subset": "train" })
  
    # Log images
    run.track(Image(tensor_or_pil, caption), name='gen', step=step, context={ "subset": "train" })
    # Log distributions
    run.track(Distribution(tensor), name='gradients', step=step, context={ "type": "weights" })See documentation here.
3. Run the training as usual and start Aim UI
aim up4. Or query runs programmatically via SDK
from aim import Repo
my_repo = Repo('/path/to/aim/repo')
query = "metric.name == 'loss'" # Example query
# Get collection of metrics
for run_metrics_collection in my_repo.query_metrics(query).iter_runs():
    for metric in run_metrics_collection:
        # Get run params
        params = metric.run[...]
        # Get metric values
        steps, metric_values = metric.values.sparse_numpy()Integrate PyTorch Lightning
from aim.pytorch_lightning import AimLogger
# ...
trainer = pl.Trainer(logger=AimLogger(experiment='experiment_name'))
# ...See documentation here.
Integrate Hugging Face
from aim.hugging_face import AimCallback
# ...
aim_callback = AimCallback(repo='/path/to/logs/dir', experiment='mnli')
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset if training_args.do_train else None,
    eval_dataset=eval_dataset if training_args.do_eval else None,
    callbacks=[aim_callback],
    # ...
)
# ...See documentation here.
Integrate Keras & tf.keras
import aim
# ...
model.fit(x_train, y_train, epochs=epochs, callbacks=[
    aim.keras.AimCallback(repo='/path/to/logs/dir', experiment='experiment_name')
    
    # Use aim.tensorflow.AimCallback in case of tf.keras
    aim.tensorflow.AimCallback(repo='/path/to/logs/dir', experiment='experiment_name')
])
# ...See documentation here.
Integrate XGBoost
from aim.xgboost import AimCallback
# ...
aim_callback = AimCallback(repo='/path/to/logs/dir', experiment='experiment_name')
bst = xgb.train(param, xg_train, num_round, watchlist, callbacks=[aim_callback])
# ...See documentation here.
Training run comparison
Order of magnitude faster training run comparison with Aim
- The tracked params are first class citizens at Aim. You can search, group, aggregate via params - deeply explore all the tracked data (metrics, params, images) on the UI.
- With tensorboard the users are forced to record those parameters in the training run name to be able to search and compare. This causes a super-tedius comparison experience and usability issues on the UI when there are many experiments and params. TensorBoard doesn't have features to group, aggregate the metrics
Scalability
- Aim is built to handle 1000s of training runs - both on the backend and on the UI.
- TensorBoard becomes really slow and hard to use when a few hundred training runs are queried / compared.
Beloved TB visualizations to be added on Aim
- Embedding projector.
- Neural network visualization.
MLFlow is an end-to-end ML Lifecycle tool. Aim is focused on training tracking. The main differences of Aim and MLflow are around the UI scalability and run comparison features.
Run comparison
- Aim treats tracked parameters as first-class citizens. Users can query runs, metrics, images and filter using the params.
- MLFlow does have a search by tracked config, but there are no grouping, aggregation, subplotting by hyparparams and other comparison features available.
UI Scalability
- Aim UI can handle several thousands of metrics at the same time smoothly with 1000s of steps. It may get shaky when you explore 1000s of metrics with 10000s of steps each. But we are constantly optimizing!
- MLflow UI becomes slow to use when there are a few hundreds of runs.
Hosted vs self-hosted
- Weights and Biases is a hosted closed-source MLOps platform.
- Aim is self-hosted, free and open-source experiment tracking tool.
❇️ The Aim product roadmap
- The Backlogcontains the issues we are going to choose from and prioritize weekly
- The issues are mainly prioritized by the highly-requested features
The high-level features we are going to work on the next few months
- Live updates (Shipped: Oct 18 2021)
- Images tracking and visualization (Start: Oct 18 2021, Shipped: Nov 19 2021)
- Distributions tracking and visualization (Start: Nov 10 2021, Shipped: Dec 3 2021)
- Jupyter integration (Start: Nov 18 2021, Shipped: Dec 3 2021)
- Audio tracking and visualization (Start: Dec 6 2021, Shipped: Dec 17 2021)
- Transcripts tracking and visualization (Start: Dec 6 2021, Shipped: Dec 17 2021)
- Plotly integration (Start: Dec 1 2021, Shipped: Dec 17 2021)
- Colab integration (Start: Nov 18 2021, Shipped: Dec 17 2021)
- Centralized tracking server (Start: Oct 18 2021, Shipped: Jan 22 2022)
- Tensorboard adaptor - visualize TensorBoard logs with Aim (Start: Dec 17 2021, Shipped: Feb 3 2022)
- Track git info, env vars, CLI arguments, dependencies (Start: Jan 17 2022, Shipped: Feb 3 2022)
- Scikit-learn integration (Start: Nov 18 2021)
- MLFlow adaptor (visualize MLflow logs with Aim) (Start: Feb 14 2022)
Aim UI
- Runs management
- Runs explorer – query and visualize runs data(images, audio, distributions, ...) in a central dashboard
- Single run page
- Run summary and overview info(system params, CLI args, git info, ...)
- Run execution details(display stdout/stderr logs)
- Run notes
 
 
- Explorers
- Audio Explorer
- Text Explorer
- Figures Explorer
- Distributions Explorer
 
- Dashboards – customizable layouts with embedded explorers
SDK and Storage
- Cloud-native support
- Cloud storage support – store runs blob(e.g. images) data on the cloud
- Artifact storage – store files, model checkpoints, and beyond
 
- Scalability
- Smooth UI and SDK experience with over 10.000 runs
- Long sequences(up to 5M of steps) support
 
- Runs management
- SDK interfaces
- Reporting – query and compare runs, explore data with familiar tools such as matlpotlib and pandas
- Manipulations – copy, move, delete runs, params and sequences
 
- CLI interfaces
- Reporting - runs summary and run details in a CLI compatible format
- Manipulations – copy, move, delete runs, params and sequences
 
 
- SDK interfaces
Integrations
- ML Frameworks:
- Shortlist: PyTorch-Ignite, MONAI, scikit-learn, SpaCy, AllenNLP, LightGBM, Raytune, Fairseq, fast.ai, KerasTuner
 
- Datasets versioning tools
- Shortlist: Activeloop Hub, DVC, HuggingFace Datasets
 
- Resource management tools
- Shortlist: Kubeflow, Slurm
 
- Workflow orchestration tools
- Others: Hydra, Google MLMD, Streamlit, ...