HARP is a transferable neural network for WAN Traffic Engineering that is designed to handle changing topologies. It was published at ACM SIGCOMM 2024.
If you use this code, please cite:
@inproceedings{HARP,
author = {AlQiam, Abd AlRhman and Yao, Yuanjun and Wang, Zhaodong and Ahuja, Satyajeet Singh and Zhang, Ying and Rao, Sanjay G. and Ribeiro, Bruno and Tawarmalani, Mohit},
title = {Transferable Neural WAN TE for Changing Topologies},
year = {2024},
isbn = {9798400706141},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3651890.3672237},
doi = {10.1145/3651890.3672237},
booktitle = {Proceedings of the ACM SIGCOMM 2024 Conference},
pages = {86–102},
numpages = {17},
keywords = {traffic engineering, wide-area networks, network optimization, machine learning},
location = {Sydney, NSW, Australia},
series = {ACM SIGCOMM '24}
}
Please contact [email protected] for any questions.
HARP was tested using the following setup:
- Ubuntu 22.04 machine
- Python 3.10.6
torch==2.1.0+cu121torch-scatter==2.1.2- Check the rest in requirements.txt
- Install the required Python packages as listed in the requirements.txt. Use:
pip3 install -r requirements.txt - Please follow this link to install a version of PyTorch that fits your environment (CPU/GPU).
- Identify and copy the link of a suitable URL depending on PyTorch and CUDA/CPU versions installed in the previous step. Then, run:
pip install --no-index torch-scatter -f [URL]
- Follow Gurobi Website to install and setup Gurobi Optimizer.
-
In the
manifestfolder, The user should provide atxtfile that holds the topology name and describes at every time step the topology_file.json,set_of_pairs_file.pkl,traffic_matrix.pkl file that will be read at that time step. For every timestep, a corresponding file of these three should exist in thetopologies,pairs, andtraffic_matricesfolders inside a directory with the topology name. -
For details on the data format, please check Data Format
-
To compute optimal values and cluterize your dataset, run:
python3 frameworks/gurobi_mlu.py --num_paths_per_pair 15 --opt_start_idx 0 --opt_end_idx 2000 --topo TopoName --framework gurobi- Please refer to our paper to check the definition of a "cluster" in this context.
-
To train, run (for example):
python3 run_harp.py --topo TopoName --mode train --epochs 100 --batch_size 32 --lr 0.001 --num_paths_per_pair 8 --num_transformer_layers 2 --num_gnn_layers 3 --num_mlp1_hidden_layers 2 --num_mlp2_hidden_layers 2 --num_for_loops 3 --train_clusters 0 1 2 3 --train_start_indices 0 0 0 0 --train_end_indices 200 200 200 200 --val_clusters 4 5 --val_start_indices 0 0 --val_end_indices 90 90 --framework harp --pred 0 --dynamic 1python3 run_harp.py --topo abilene --mode train --epochs 100 --lr 0.007 --batch_size 32 --num_paths_per_pair 8 --num_transformer_layers 2 --num_gnn_layers 3 --num_mlp1_hidden_layers 1 --num_mlp2_hidden_layers 1 --num_for_loops 3 --train_clusters 0 --train_start_indices 0 --train_end_indices 12096 --val_clusters 0 --val_start_indices 12096 --val_end_indices 14112 --framework harp --pred 0 --dynamic 0python3 run_harp.py --topo kdl --mode train --epochs 100 --lr 0.007 --batch_size 8 --num_paths_per_pair 4 --num_transformer_layers 1 --num_gnn_layers 1 --num_mlp1_hidden_layers 1 --num_mlp2_hidden_layers 1 --num_for_loops 3 --train_clusters 0 --train_start_indices 0 --train_end_indices 170 --val_clusters 0 --val_start_indices 170 --val_end_indices 200 --framework harp --pred 0 --dynamic 0
-
To test, run (for example):
python3 run_harp.py --topo TopoName --mode test --num_paths_per_pair 15 --num_for_loops 14 --test_cluster 6 --test_start_idx 0 --test_end_idx 150 --framework harp --pred 0 --dynamic 1python3 run_harp.py --topo abilene --mode test --num_paths_per_pair 8 --num_for_loops 3 --test_cluster 0 --test_start_idx 14112 --test_end_idx 16128 --framework harp --pred 0 --dynamic 0python3 run_harp.py --topo kdl --mode test --num_paths_per_pair 8 --num_for_loops 3 --test_cluster 0 --test_start_idx 200 --test_end_idx 278 --framework harp --pred 0 --dynamic 0- Note that only one cluster is allowed per testing mode run.
-
For further explanation on command line arguments, see Command Line Arguments Explanation
- Download
AbileneTM-all.tarfrom this link and decompress it (twice) insideprepare_abilenefolder.cd prepare_abilenewget https://www.cs.utexas.edu/~yzhang/research/AbileneTM/AbileneTM-all.tartar -xvf AbileneTM-all.targunzip *.gz- Then, run
python3 prepare_abilene_harp.py - This example should serve as a reference on how to prepare any dataset.
- Execute
wget --content-disposition "https://app.box.com/shared/static/shzgaxnt36org6dmu9q228kzk28numue?dl=1" -P traffic_matrices/to download GEANT traffic matrices.- A preprocessed copy of the GEANT dataset in the format needed by HARP is available on this link
- Update: 09/30/2024: GEANT matrices were scaled down to have the same unit as capacities.
- Execute
wget --content-disposition "https://app.box.com/shared/static/qyq2zt160hxmmrwnt1eg792vctjmg64b?dl=1" -P traffic_matrices/to download KDL traffic matrices.- A preprocessed copy of the KDL dataset in the format needed by HARP is available on this link.
-
By default, HARP trains over ground truth matrices.
-
Running HARP with
--pred 1trains it over predicted matrices rather than ground truth matrices. -
An ESM (Exponential Smoothing) predictor is provided in
traffic_matricesdirectory.- To use it, run:
python3 esm_predictor.py TopoName
- To use it, run:
-
Provide the predicted traffic matrices using a predictor of your choice, then put them inside the
traffic_matricesdirectory inside a folder namedTopoName_PredType.- For example, for the GEANT dataset, original matrices will be under the
GEANTdirectory whereas predicted matrices will be under theGEANT_PredTypedirectory. - Make sure that at every time step, the predicted matrix corresponds to the ground truth matrix at that time step.
- For example: t100.pkl in the
GEANTand theGEANT_PredTypefolders correspond to each other.
- For example: t100.pkl in the
- For example, for the GEANT dataset, original matrices will be under the
-
You can specify
--pred_type(default:esm) to indicate the predictor type if you manage multiple predicted datasets.
- After training HARP model on GEANT and Abilene, run:
python3 run_failures.py --topo geant --num_paths_per_pair 8 --num_for_loops X --test_start_idx start --test_end_idx end --pred 0 --test_cluster 0python3 run_failures.py --topo abilene --num_paths_per_pair 8 --num_for_loops X --test_start_idx start --test_end_idx end --pred 0 --test_cluster 0
- This will compute the optimal for each failure scenario, and then run HARP for that scenario.
- Traffic matrices: Numpy array of shape (num_pairs, 1)
- Pairs: Numpy array of shape (num_pairs, 2)
- Note: the kth demand in the traffic matrix must correspond to the kth pair in the set of pairs file. This relation must be preserved for all snapshots. We suggest sorting the hash map (pairs/keys and values/demands) before separating.
-
Paths: By default, HARP computes K shortest paths and automatically puts them in the correct folders and format.
- If you wish to use your paths:
- create a Python dictionary where the keys are the pairs and the values are a list of
$K$ lists, where the inner lists are a sequence of edges. - For example: {(s, t): [[(s, a), (a, t)], [(s, a), (a, b), (b, t)]]}.
- Put it inside
topologies/paths_dictand name it: TopoName_K_paths_dict_cluster_NumCluster.pkl- For example: abilene_8_paths_dict_cluster_0.pkl
- Make sure all pairs have the same number of paths (replicate if needed).
- create a Python dictionary where the keys are the pairs and the values are a list of
- If you wish to use your paths:
| Flag | Meaning | Notes |
|---|---|---|
| framework | Determines the framework that solves the problem [harp, gurobi]. | |
| num_heads | Number of transformer attention heads [int]. | By default, it is equal to the number of GNN layers. |
| num_for_loops | Determines HARP's Number of RAUs. | |
| dynamic | If your topology varies across snapshots, set it to 1. If it is static, set it to 0. |
In our paper, the AnonNet network is dynamic. GEANT, Abilene, and KDL networks are static. This CLA is useful to save GPU memory when training for a (static) topology that does not change across snapshots. |
| dtype | Determines the dtype of HARP and its data [float32, float16] corresponding to [torch.float32, torch.bfloat16]. |
The default is float32. |
| checkpoint | Enables/disables gradient checkpointing while training HARP to reduce memory footprint. | Gradient checkpointing trades off time for memory at the level of the mini-batch. Default: 0 (disabled). |
| meta_learning | Turn on/off meta learning. This is used to train HARP on geant dataset for a couple of epochs before training it on the desired dataset. | This is generally useful, but specifically when the network/dataset is highly dynamic with lots of failures, where HARP might struggle to/not converge. Meta-learning significantly improves the convergence. |
| pred_type | Label for predicted TM source | Default: esm. |