Welcome
My name is Catherine Arnett. I’m currently a Linguistics PhD Candidate at UC San Diego and a research scientist at PleIAs. My main research interest is multilingual NLP.
To contact me, you can email me at ccarnett [at] ucsd [dot] edu or find me on Twitter. You can find my work on ResearchGate, Orcid, and Google Scholar. I theoretically post my code on GitHub.
News
- Two of my papers were accepted to EMNLP:
- “BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training”, which was done with my colleagues at PleIAs
- “When is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages” which was done with my collaborators at UCSD CogSci
- Tyler Chang and I released Goldfish Models, small comparable monolingual models for 350 languages. Check out the Twitter thread overview of the release.
- Our paper, Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics was accepted and will be presented at the first Conference on Language Modeling (COLM).