Welcome
My name is Catherine Arnett. Iām currently Lead Research Scientist at PleIAs and Iām finishing my PhD in Linguistics with Computational Social Science at UC San Diego. My main research interest is multilingual NLP.
To contact me, you can email me at catherine [at] pleias [dot] fr or find me on š¦ BlueSky.
Other links:
HuggingFace GitHub Google Scholar Semantic Scholar Orcid
News
- Tyler Chang and my paper, āWhen is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languagesā was awarded Outstanding Paper at EMNLP!š„
- I have a new pre-print about toxicity detection in multilingual and historical text data: Toxicity of the Commons: Curating Open-Source Pre-Training Data. Read the blog post for an overview.
- Tyler Chang and I released Goldfish Models, small comparable monolingual models for 350 languages. Check out the Twitter thread overview of the release.