This is the code accompanying the paper "PyFed: Exteding PySyft with N-IID Federated Learning Benchmark" Paper link: https://caiac.pubpub.org/pub/7yr5bkck/release/1
PyFed is a benchmarking framework for federated learning extending PySyft, in a generic and distributed way. PyFed supports different aggregations methods and data distributions (Independent and Identically Distributed Data (IID) and Non-IID).
In this sense, PyFed is an alternative benchemarking framework of LEAF for Federated Learning for PySyft.
The benchmarking is done using five dataset: mnist, fashionmnist, cifar10, sent140, shakespeare.
Tested stable dependencies:
Use the package manager pip to install the requirements of PyFed.
pip install -r requirements.txt| Package | Description |
|---|---|
models |
|
datasets |
|
aggregation |
Aggregation methods for FL. |
run |
|
utils |
|
data |
Downloading the dataset. |
results |
Results of the training. |
experiments |
Benchmarking configuration. |
For running PyFed, please follow the next steps:
- Launch the workers:
python run/network/start_websocket_server.py [arguments] - Launch the training:
python run/training/main.py [arguments] - Get the results
All arguments have default values. However, these arguments should be set to the desired settings either manually or using a config file.
Workers can be launched using different arguments (see below)
| Argument | Description |
|---|---|
clients |
The number of clients: Integer |
dataset |
Dataset to be used: mnist, fashionmnist, cifar10, sent140, shakespeare. |
split_mode |
The split mode used: iid or niid. |
global_dataset |
Share global dataset over all clients. |
data_rate |
Percentage of samples in the global dataset to be added: 0.x |
add_error |
Add error to some samples: True or False. |
error_rate |
Percentage of error to be added: 0.x |
In the case of IID distribution (split_mode = iid), the following agruments are available:
| Argument | Description |
|---|---|
iid_share |
Share samples between clients in the iid split mode. |
iid_rate |
Percentage of samples to share between clients: 0.x |
In the case of Non-IID distribution (split_mode = niid), the following agruments are available:
| Argument | Description |
|---|---|
data_size |
The number of samples that hold each client: Integer |
type |
The split types: random or label split. |
label_num |
The number of classes holded by a client when with label split type: Integer |
share_samples |
How to share samples between clients holding the same classes. In the case of label split type, the following values are possible :
|
Manually
python run/network/start_websocket_server.py --clients = 5 /
--dataset = mnist /
--split_mode = niid /
--type = label /
--data_size = [234,2134,64,4132,1000] /
--label_num = [3,8,5,2,3] /
--share_samples = 2
Or using config.yml
python run/network/start_websocket_server.py -f file_name
After launching the workers correctly, we are ready to start the training using the following arguments.
| Argument | Description |
|---|---|
model |
The file name (without .py extension) containing the model to be trained (see the models directory): cnn,lstm. |
batch_size |
The batch size of the training: Integer. |
test_batch_size |
The batch size used for the test data: Integer. |
training_rounds |
The number of federated learning rounds: Integer. |
federate_after_n_batches |
The number of training steps performed on each remote worker before averaging: Integer. |
lr |
The learning rate: float |
cuda |
The use cuda: True or False. |
seed |
The seed used for randomization: Integer. |
eval_every |
Evaluate the model evrey n rounds: Integer. |
fraction_client |
The number of clients that will in each round: Integer. |
optimizer |
The optimazer that we will use: SGD or Adam. |
aggregation |
The type of aggragation : federated_avg or weighted_avg. |
loss |
The loss function: nll_loss or cross_entropy. |
Manually
python run/training/main.py --model = cnn /
--dataset = mnist /
--batch_size = 10 /
--lr = 0.1 /
--training_rounds = 100 /
--eval_every = 10 /
--optimizer = SGD /
--aggregation = federated_avg /
--loss = nll_loss
Using config.yml file
python run/training/main.py -f file_name
The obtained experimentation results using PyFed framework. You can check all the results and the configuration in the experimentation package.
Benchmark configuration.
The total number of clients is 100.
|
Dataset |
Model |
Epochs |
Batch size |
Fraction |
Learning rate |
Rounds |
|
Cifar10 |
CNN |
1 |
5 |
0.1 |
0.1 |
2500 |
|
Fasionmnist |
CNN |
1 |
10 |
0.1 |
0.1 |
100 |
|
Mnist |
CNN |
1 |
10 |
0.1 |
0.1 |
20 |
|
CNN (with batch normalisation) |
1 |
10 |
0.1 |
0.1 |
20 |
|
|
Sent140 |
LSTM |
1 |
1 |
0.1 |
0.1 |
1000 |
|
Shakespeare |
GRU |
1 |
1 |
0.1 |
0.8 |
2000 |
|
Dataset |
Model |
Accuracy |
Loss |
|
Cifar10 |
CNN |
67 |
0.8043 |
|
Fasionmnist |
CNN |
86.81 |
0.368 |
|
Mnist |
CNN |
95.63 |
0.1384 |
|
CNN(with batch normalisation) |
96.33 |
0.1154 |
|
|
Sent140 |
LSTM |
65.45 |
0.8345 |
|
Shakespeare |
GRU |
50.36 |
1.2452 |
|
Dataset |
Model |
Non iid (split by label) |
Non iid (random split) |
||||||
|
Type 0 |
Type 1 |
Type 2 |
|||||||
|
Accuracy |
Loss |
Accuracy |
Loss |
Accuracy |
Loss |
Accuracy |
Loss |
||
|
Cifar10 |
CNN |
66.78 |
0.8132 |
65.89 |
0.8453 |
65.45 |
0.8464 |
66.89 |
0.8121 |
|
Fasionmnist |
CNN |
85.36 |
0.4029 |
85.8 |
0.3956 |
85.42 |
0.4009 |
86.57 |
0.3727 |
|
Mnist |
CNN |
93.45 |
0.2171 |
93.88 |
0.2164 |
93.84 |
0.2086 |
95.04 |
0.1671 |
|
CNN(with batch normalisation) |
94.25 |
0.1902 |
94.74 |
0.1771 |
94.76 |
0.1884 |
96.09 |
0.13 |
|
|
Sent140 |
LSTM |
64.4 |
0.9244 |
64.23 |
0.9445 |
65.78 |
0.8123 |
65.1 |
0.8663 |
|
Shakespeare |
GRU |
48.26 |
1.3452 |
48.76 |
1.2052 |
45.23 |
1.7452 |
49.46 |
1.2952 |
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.