Multilingual Language Models

My dissertation work focuses on multilingual language models. Some of the questions I am interested include:

  • How do multilingual language models represent information for the different languages they were trained on?
  • What are the optimal conditions for crosslingual transfer in multilingual models?
  • How can we improve performance for low-resource languages?

One of my projects uses Structural Priming, an experimental paradigm from psycholinguistics to find evidence for multilingual abstract grammatical representations in multilingual language models. My collaborators (James Michaelov, Tyler Chang, and Benjamin Bergen) and I presented the full paper “Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models” at EMNLP 2023. We also presented an extended abstract “Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models” at the Multilingual Representations Learning Workshop co-located with EMNLP 2023.

Another project I have been working on involves transfer and low-resource languages. My collaborator Tyler Chang and I trained 10,000 language models on 252 languages to investigate the condition for optimal crosslingual transfer, especially for low-resource languages. We have a manuscript pre-print entitled, “When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages”.

Verbal Reduplication in Mandarin

I am interested in the various verbal reduplication patterns in Mandarin Chinese, especially their usage and meaning.

For a brief overview, you can check out my AMLaP and NACCL-32 presentations. For more information, see the Reduplication page.

Language and Environment

My collaborator, Maho Takahashi, and conducted a reanalysis of the work on the relationship between language and the environment in which a language is spoken. We found that the correlations are not robust when other factors are taken into consideration.

We have a poster, entitled Creating a Baseline to Evaluate Correlations Between Language and Environment [poster] at the Machine Learning for Language Evolution Workshop at the Joint Conference on Language Evolution 2022.

Previous Work

During my undergraduate degree, I was working on the typology of the framing of events in Romance languages. I used several corpora and looked at the change of verb framing from Latin through Medieval French and Spanish to modern Romance varieties.

To learn more, you can read my proceedings paper.

