Welcome
My name is Catherine Arnett. Iām an NLP Researcher at EleutherAI. I am mainly interested in cross-lingual and multilingual NLP (see the research page for more information). I recently finished my PhD in Linguistics with a specialization in Computational Social Science at UC San Diego. I previously was Lead Research Scientist at PleIAs.
To contact me, you can email me at catherine [dot] arnett [at] gmail [dot] com or find me on š¦ BlueSky.
Other links:
HuggingFace
GitHub
Google Scholar
Semantic Scholar
Orcid
News
- My paper āWhy do language models perform worse for morphologically complex languages?ā was awarded Best Paper at COLING 2025!š
- Tyler Chang and my paper, āWhen is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languagesā was awarded Outstanding Paper at EMNLP!š„
- I have a new pre-print about toxicity detection in multilingual and historical text data: Toxicity of the Commons: Curating Open-Source Pre-Training Data. Read the blog post for an overview.
- Tyler Chang and I released Goldfish Models, small comparable monolingual models for 350 languages. Check out the Twitter thread overview of the release.