My name is Anastasia and I am interested in mechanistic interpretability and explainability of LLM componenets. Another interest for me is building robust detectors of AI-generated content and investigating fundamental difference between human-written and LLM-generated texts.
๐ My publications and preprints
- Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts
- Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA
- Quantifying Logical Consistency in Transformers via Query-Key Alignment
- Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts