Welcome
My name is Catherine Arnett. Iām an NLP Researcher at EleutherAI. I am interested in cross-lingual and multilingual NLP. I received PhD in Linguistics with a specialization in Computational Social Science at UC San Diego.
To contact me, you can email me at catherine [dot] arnett [at] gmail [dot] com or find me on š¦ BlueSky or Twitter.
Other links:
HuggingFace
GitHub
Google Scholar
Semantic Scholar
Orcid
News
- Our new benchmark, Global PIQA, is out in collaboration with over 300 authors. See the preprint or use the dataset now!
- I have a new blog post out called āThere is no such thing as a tokenizer-free lunchā
- My paper with Tyler Chang, Stella Biderman, and Ben Bergen got accepted to NeurIPS! The preprint is out now!
- Sander Land and my paper, BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization, won Best Paper at the first Tokenization Workshop at ICML 2025! š