Welcome
My name is Catherine Arnett. Iām currently Lead Research Scientist at PleIAs and am completing my PhD in Linguistics with Computational Social Science at UC San Diego. My main research interest is multilingual NLP.
To contact me, you can email me at catherine [at] pleias [dot] fr or find me on š¦ BlueSky.
Other links:
HuggingFace GitHub Google Scholar Semantic Scholar Orcid
News
the inaugural conference of the international association for safe and ethical AI
- I will be representing PleIAs at the First Conference of the International Association for Safe & Ethical AI, as part of the AI Action Summit in Paris
- My paper āWhy do language models perform worse for morphologically complex languages?ā is accepted to COLING 2025!
- Tyler Chang and my paper, āWhen is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languagesā was awarded Outstanding Paper at EMNLP!š„
- I have a new pre-print about toxicity detection in multilingual and historical text data: Toxicity of the Commons: Curating Open-Source Pre-Training Data. Read the blog post for an overview.
- Tyler Chang and I released Goldfish Models, small comparable monolingual models for 350 languages. Check out the Twitter thread overview of the release.