Priors in Time: Missing Inductive Biases for Language Model Interpretability
Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valerie Costa, Greta Tuckute, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Daniel Wurgaft, Eric J. Bigelow, Johnny Lin, Demba Ba, Martin Wattenberg, Fernanda Viegas, Melanie Weber, Aaron Mueller.
arXiv preprint. [paper] [code]
From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts? Aaron Mueller, Andrew Lee, Shruti Joshi, Ekdeep Singh Lubana, Dhanya Sridhar, Patrik Reizinger.
arXiv preprint. [paper]
In-context Learning Without Copying
Kerem Sahin, Sheridan Feucht, Adam Belfki, Jannik Brinkmann, Aaron Mueller, David Bau, Chris Wendler.
arXiv preprint. [paper] [code]
Crosscoding Through Time: Tracking Emergence & Consolidation Of Linguistic Representations Throughout LLM Pretraining
Deniz Bayazit, Aaron Mueller, Antoine Bosselut.
arXiv preprint. [paper] [code]
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov.
arXiv preprint. [paper]
The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov.
Computational Linguistics. [paper]
Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Ethan Gotlieb Wilcox, Michael Hu, Aaron Mueller, Tal Linzen, Alex Warstadt, Leshem Choshen, Chengxu Zhuang, Ryan Cotterell, Adina Williams. Journal of Memory and Language (JML). [paper]
MIB: A Mechanistic Interpretability Benchmark Aaron Mueller*, Atticus Geiger*, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov. International Conference on Machine Learning (ICML). [website] [paper] [code] [data] [leaderboard]
NNsight and NDIF: Democratizing Access to Foundation Model Internals
Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, David Bau. International Conference on Learning Representations (ICLR). [paper] [website] [source]
Language Model Acceptability Judgments Are Not Always Robust to Context (Outstanding Paper Award)
Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams. Association for Computational Linguistics (ACL). [paper]
Inverse Scaling: When Bigger Isn't Better (Featured Paper)
Ian R. McKenzie, Alexander Lyzhov, Michael Martin Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Xudong Shen, Joe Cavanagh, Andrew George Gritsevskiy, Derik Kauffman, Aaron T. Kirtland, Zhengping Zhou, Yuhui Zhang, Sicong Huang, Daniel Wurgaft, Max Weiss,
Alexis Ross, Gabriel Recchia, Alisa Liu, Jiacheng Liu, Tom Tseng, Tomasz Korbak, Najoung Kim, Samuel R. Bowman, Ethan Perez. Transactions on Machine Learning Research (TMLR). [paper]
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman. Association for Computational Linguistics (ACL). [paper]
2022
Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster. Association for Computational Linguistics (ACL). [paper] [code]
Label Semantic Aware Pre-training for Few-shot Text Classification Aaron Mueller, Jason Krone, Salvatore Romeo, Saab Mansour, Elman Mansimov, Yi Zhang, Dan Roth. Association for Computational Linguistics (ACL). [paper]
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models Aaron Mueller*, Matthew Finlayson*, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov. Association for Computational Linguistics (ACL). [paper] [code] [bib]
An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]
The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration
Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]
Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages
Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]
Findings of the Second BabyLM Challenge: Sample-efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Ryan Cotterell, Leshem Choshen, Alex Warstadt, Ethan Gotlieb Wilcox. Proceedings of the shared task at the Conference on Computational Natural Language Learning (CoNLL). [website] [paper]
2023
Findings of the BabyLM Challenge: Sample-efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt*, Aaron Mueller*, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell. Proceedings of the shared task at the Conference on Computational Natural Language Learning (CoNLL). [website] [paper]