Mitigating Memorization in Language Models

ICLR 2025 Spotlight Paper Link to Paper

Language models (LMs) can “memorize” information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-based, and eleven machine unlearning-based methods, with five of the latter being new methods that we introduce. We also introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods. We demonstrate that the mitigation methods that we develop using TinyMem can successfully be applied to production-grade LMs, and we determine via experiment that: regularizer-based mitigation methods are slow and ineffective at curbing memorization; fine-tuning-based methods are effective at curbing memorization, but overly expensive, especially for retaining higher accuracies; and unlearning-based methods are faster and more effective, allowing for the precise localization and removal of memorized information from LM weights prior to inference. We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.

Loss landscapes for the Pythia 2.8B model. (a) Original model's landscape; model has memorized content. (b) Well edited model's landscape using BalancedSubnet with well configured hyper parameters (HPs); reduced memorization & preserved model performance. (c) Badly edited model's landscape using Subnet with poorly configured HPs; reduced memorization but did not preserve model performance. While the good edit does not appear to change the landscape much, the bad edit drastically changes the loss landscape.

We give a high-level overview of the code structure in this repository below. More detailed READMEs can be found in every subdirectory with pointers to any external repos we utilized or took inspiration from. If there are any questions or concerns, please feel free to open a github issue or email [email protected].

Training/Fine-tuning TinyMem models

memorization_in_toy_models.py allows you to train TinyMem models that contain noised memorized information. Do python memorization_in_toy_models.py --help for usage details.
ft_toy_model.py allows you to further fine-tune TinyMem models. Do python ft_toy_model.py --help for usage details.

Data

pythia_mem_data points to the memorized data that we evaluated the Pythia 2.8B/6.9B models on.
old_data.py is how we generate training data for training our TinyMem models. Do old_data.py --help to see full script arguments.

Unlearning methods

localize_memorization.py is how we apply unlearning strategies to a given trained TinyMem model. Do localize_memorization.py --help for usage details.
prod_grad.py is how we apply unlearning strategies to production grade models (Pythia 2.8B/6.9B). This script is near identical to src/localize/localize_memorization.py, but with a few key differences to support different (larger) models/data. Do prod_grad.py --help for usage details.
localize_hp_sweep.py is a wrapper around both src/localize/localize_memorization.py and src/localize/prod_grad.py to enable hyperparameter searches for machine unlearning strategies for both TinyMem and production grade LMs. Do localize_hp_sweep.py --help for usage details.
localize/neuron/ contains implementations of the neuron-based localization strategies. To apply these methods, use the localize_memorization.py for TinyMem models or prod_grad.py for Pythia models.
localize/weight/ contains implementations of the weight-based localization strategies. To apply these methods, use the localize_memorization.py for TinyMem models or prod_grad.py for Pythia models.

Regularizers

This directory contains the implementations for all three regularizers we considered in this study.

dropout.py contains the implementation for "example-tied-dropout".
dropper.py contains the implementation for "loss truncation".
spectral_reg.py contains the implementation for "spectral norm regularizer".

Experimental Launch Scripts

Training, Fine-tuning & Regularizer:

basic_train has training and regularizer experimental scripts
ft has fine-tuning experimental scripts

TinyMem LM Unlearning:

parsl_localize.py has unlearning experimental scripts for TinyMem models

python parsl_localize.py

Production Grade Models (Pythia 2.8B/6.9B) Unlearning:

parsl_localize_nersc.py has unlearning experimental scripts for TinyMem models

python parsl_localize_nersc.py

Figure Creation Scripts

vis contains directions for generating loss landscapes
csv contains the results for the TinyMem unlearning experiments
any *.pdf is a figure that we created (not all figures are final and included in the paper). See paper for the latest figures.
Visualization_Notebook.ipynb details how to load in a trained TinyMem model and do inference on it
all_pythia_unlearning_runs.csv contains all of the unlearning results for the Pythia models
best_pythia_unlearning_runs.csv contains the best HP runs for each unlearning result for the Pythia models
pythia_unlearning_results.ipynb contains code to process all of the Pythia unlearning experimental results

Installation

Requirements:

python >=3.7,<3.11

git clone https://github.com/msakarvadia/memorization.git
cd memorization
conda create -p env python==3.10
conda activate env
pip install -r requirements.txt
pip install -e .

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

To maintain consistent formatting, we take advantage of black via pre-commit hooks. There will need to be some user-side configuration. Namely, the following steps:

Install black via pip install black (included in requirements.txt).
Install pre-commit via pip install pre-commit (included in requirements.txt).
Run pre-commit install to setup the pre-commit hooks.

Once these steps are done, you just need to add files to be committed and pushed and the hook will reformat any Python file that does not meet Black's expectations and remove them from the commit. Just re-commit the changes and it'll be added to the commit before pushing.

Citation

Please cite this work as:

@inproceedings{sakarvadia2025mitigating,
  title={Mitigating Memorization In Language Models},
  author={Sakarvadia, Mansi and Ajith, Aswathy and Khan, Arham and Hudson, Nathaniel and Geniesse, Caleb and Chard, Kyle and Yang, Yaoqing and Foster, Ian and Mahoney, Michael},
  booktitle={International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=MGKDBuyv4p}
}

Name		Name	Last commit message	Last commit date
Latest commit History 666 Commits
figs		figs
models		models
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
easy_cpu_install.sh		easy_cpu_install.sh
environment.yaml		environment.yaml
pre-commit-config.yaml		pre-commit-config.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mitigating Memorization in Language Models

Training/Fine-tuning TinyMem models

Data

Unlearning methods

Regularizers

Experimental Launch Scripts

Training, Fine-tuning & Regularizer:

TinyMem LM Unlearning:

Production Grade Models (Pythia 2.8B/6.9B) Unlearning:

Figure Creation Scripts

Installation

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

msakarvadia/memorization

Folders and files

Latest commit

History

Repository files navigation

Mitigating Memorization in Language Models

Training/Fine-tuning TinyMem models

Data

Unlearning methods

Regularizers

Experimental Launch Scripts

Training, Fine-tuning & Regularizer:

TinyMem LM Unlearning:

Production Grade Models (Pythia 2.8B/6.9B) Unlearning:

Figure Creation Scripts

Installation

Setting Up Pre-Commit Hooks (for nice code formatting)

Black

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages