This repository provides an implementation of a framework for noise-efficient differentially private dataset distillation. The code is adapted from Differentially Private Dataset Condensation.
The framework introduces Decoupled Optimization and Sampling (DOS) and Subspace discovery for Error Reduction (SER) to improve the utility of distilled datasets under differential privacy constraints.
See requirements.txt
To calculate the noise scale (sigma) for a given privacy budget:
python compute_sigma_with_fixed_budget.pySpecify your privacy budget and dataset parameters in the script.
Using the computed sigma, run the distillation process:
CUDA_VISIBLE_DEVICES=1 python dosser.py \
--sampling_iteration 1000 \
--training_iteration 200000 \
--dataset CIFAR10 \
--aux_path /data/rzheng/sd_cifar10_50000_96 \
--aux_ipc 100 \
--ser_dim 1000 \
--SER --PEA--sampling_iteration: Number of sampling iterations.--training_iteration: Number of optimization iterations.--dataset: Dataset to be used (e.g., CIFAR10).--aux_path: Path to auxiliary dataset.--aux_ipc: Images per class for auxiliary dataset.--ser_dim: Dimension of the subspace for SER.--SER: Enable Subspace Error Reduction (SER).--PEA: Use Partitioning and Expansion Augmentation (optional)