ICLR 2025 Spotlight Paper
Link to Paper
Language models (LMs) can “memorize” information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-based, and eleven machine unlearning-based methods, with five of the latter being new methods that we introduce. We also introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods. We demonstrate that the mitigation methods that we develop using TinyMem can successfully be applied to production-grade LMs, and we determine via experiment that: regularizer-based mitigation methods are slow and ineffective at curbing memorization; fine-tuning-based methods are effective at curbing memorization, but overly expensive, especially for retaining higher accuracies; and unlearning-based methods are faster and more effective, allowing for the precise localization and removal of memorized information from LM weights prior to inference. We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.
Loss landscapes for the Pythia 2.8B model. (a) Original model's landscape; model has memorized content.
(b) Well edited model's landscape using BalancedSubnet with well configured hyper parameters (HPs); reduced memorization & preserved model performance.
(c) Badly edited model's landscape using Subnet with poorly configured HPs; reduced memorization but did not preserve model performance.
While the good edit does not appear to change the landscape much, the bad edit drastically changes the loss landscape.
We give a high-level overview of the code structure in this repository below. More detailed READMEs can be found in every subdirectory with pointers to any external repos we utilized or took inspiration from. If there are any questions or concerns, please feel free to open a github issue or email [email protected].
memorization_in_toy_models.pyallows you to train TinyMem models that contain noised memorized information. Dopython memorization_in_toy_models.py --helpfor usage details.ft_toy_model.pyallows you to further fine-tune TinyMem models. Dopython ft_toy_model.py --helpfor usage details.
pythia_mem_datapoints to the memorized data that we evaluated the Pythia 2.8B/6.9B models on.old_data.pyis how we generate training data for training our TinyMem models. Doold_data.py --helpto see full script arguments.
localize_memorization.pyis how we apply unlearning strategies to a given trained TinyMem model. Dolocalize_memorization.py --helpfor usage details.prod_grad.pyis how we apply unlearning strategies to production grade models (Pythia 2.8B/6.9B). This script is near identical tosrc/localize/localize_memorization.py, but with a few key differences to support different (larger) models/data. Doprod_grad.py --helpfor usage details.localize_hp_sweep.pyis a wrapper around bothsrc/localize/localize_memorization.pyandsrc/localize/prod_grad.pyto enable hyperparameter searches for machine unlearning strategies for both TinyMem and production grade LMs. Dolocalize_hp_sweep.py --helpfor usage details.localize/neuron/contains implementations of the neuron-based localization strategies. To apply these methods, use thelocalize_memorization.pyfor TinyMem models orprod_grad.pyfor Pythia models.localize/weight/contains implementations of the weight-based localization strategies. To apply these methods, use thelocalize_memorization.pyfor TinyMem models orprod_grad.pyfor Pythia models.
This directory contains the implementations for all three regularizers we considered in this study.
dropout.pycontains the implementation for "example-tied-dropout".dropper.pycontains the implementation for "loss truncation".spectral_reg.pycontains the implementation for "spectral norm regularizer".
basic_trainhas training and regularizer experimental scriptsfthas fine-tuning experimental scripts
parsl_localize.py has unlearning experimental scripts for TinyMem models
python parsl_localize.py
parsl_localize_nersc.py has unlearning experimental scripts for TinyMem models
python parsl_localize_nersc.py
viscontains directions for generating loss landscapescsvcontains the results for the TinyMem unlearning experiments- any
*.pdfis a figure that we created (not all figures are final and included in the paper). See paper for the latest figures. Visualization_Notebook.ipynbdetails how to load in a trained TinyMem model and do inference on itall_pythia_unlearning_runs.csvcontains all of the unlearning results for the Pythia modelsbest_pythia_unlearning_runs.csvcontains the best HP runs for each unlearning result for the Pythia modelspythia_unlearning_results.ipynbcontains code to process all of the Pythia unlearning experimental results
Requirements:
python >=3.7,<3.11
git clone https://github.com/msakarvadia/memorization.git
cd memorization
conda create -p env python==3.10
conda activate env
pip install -r requirements.txt
pip install -e .To maintain consistent formatting, we take advantage of black via pre-commit hooks.
There will need to be some user-side configuration. Namely, the following steps:
- Install black via
pip install black(included inrequirements.txt). - Install
pre-commitviapip install pre-commit(included inrequirements.txt). - Run
pre-commit installto setup the pre-commit hooks.
Once these steps are done, you just need to add files to be committed and pushed and the hook will reformat any Python file that does not meet Black's expectations and remove them from the commit. Just re-commit the changes and it'll be added to the commit before pushing.
Please cite this work as:
@inproceedings{sakarvadia2025mitigating,
title={Mitigating Memorization In Language Models},
author={Sakarvadia, Mansi and Ajith, Aswathy and Khan, Arham and Hudson, Nathaniel and Geniesse, Caleb and Chard, Kyle and Yang, Yaoqing and Foster, Ian and Mahoney, Michael},
booktitle={International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=MGKDBuyv4p}
}