About AI4CommSci

The rapid advances in artificial intelligence don’t just change how we interact with our speakers, choose TV shows to watch, or search for information online. They also change what kind of science is possible and expand what it knowable. Established in 2024, our group’s vision is to push the envelope of what is possible, making breakthroughs in theory and practice for the communication sciences. In addition to our own research, we seek to enhance the productivity of the field as a whole, through training junior scholars and by disseminating methods, algorithms, and technology. Below, we highlight a few major threads in our research. Space considerations mean leaving out a lot of cool stuff; a full list of publications can be found here. Another good source is the extensive media coverage of our work, compiled here.

Critical Periods for Learning a Second Language

Language Learning Ability Chart
A plot of the percentage of a language’s grammar a monolingual someone can learn as a function of age, assuming that they are speaking only that language. (For instance, a 1-year-old is estimated to learn about 16% of the language’s grammar in a year.) Most people learning a second language still speak their first language much of the time, so will learn more slowly. From Chen & Hartshorne (2021).

People who learned a second language in childhood are difficult to distinguish from native speakers, whereas those who began in adulthood are often saddled with an accent and conspicuous grammatical errors. Despite being one of the oldest findings in the communication sciences, the reasons remain a mystery. For one thing, until very recently, it was unclear at exactly what age language learning becomes more difficult. By combining a dataset of unprecedented size (>600,000 people) and a novel analytic model, we showed that the rate of grammar learning declines dramatically in late adolescence.

This is despite the fact that older learners have the advantage of having already learned a first language. In a recent study, we showed that older children actually learn a new language more rapidly than younger children — but in proportion to how similar their first language is to the new language. (We do not yet know how this “linguistic transfer” affects the rate of learning in adults.)

We are currently trying to piece together exactly what is going on through a combination of neuroimaging studies, computational modeling, and new behavioral studies with different types of learners (for example, refugees or learners in immersion schools).

In the neuroimaging work, we are taking advantage of a machine learning-based method we developed with our colleague Stefano Anzellotti at Boston College called “synthetic twin analysis”. This allows for quantifying variation in neuroanatomy with far more precision than was previously possible. Initial findings suggest that the fundamental nature of the neural basis of second language acquisition does not change until at least late adolescence – consistent with our behavioral findings. We are working to expand this work to look at adults.

A panel from our 2022 paper in Science that introduced synthetic twin analysis. For details, see the original paper.

This work has been supported by a Simons Foundation SFARI and an NIH NRSA.

What Makes Humans Such Efficient Language Learners?

Humans are shockingly good at learning language. This has been clear to scientists for some time, as one learning algorithm after another has been shown to be insufficient. Recent Large Language Models only reinforce this point. We learn much faster than they do:

Llama 2 – an open-source competitor to ChatGPT – is trained on more language than a small city of humans would encounter in a lifetime. In addition, modern Large Language Models require the equivalent of vast amounts of drilling on language – something that humans do not need.

The question is why. Answering this question is key to helping individuals who struggle with language, whether due to a learning disability or to injury or stroke.

Together with Jesse Snedeker at Harvard, we have proposed an account (Conceptual Nativism) that treats language-learning as a kind of code-breaking. Unlike Llama or ChatGPT, learners have thoughts and are trying to learn how to express them through language. This is obvious for any adult learner, but work in developmental psychology shows that this is likely true for babies as well. We argue – and are currently trying to show computationally – that this puts strong constraints on learning, vastly increasing the efficiency at which one can learn. Much of our work to date has focused on testing a central prediction of Conceptual Nativism, which is that even though grammar is often thought to be a set of arbitrary and opaque rules, in fact the grammars of the world’s languages are tightly constrained by meaning. (Grammar seems opaque because we only really notice the parts that are arbitrary, though in fact they actually rare.) In one case study after another, we show that this is in fact the case.

Some additional evidence comes from the observation that children acquiring two languages are even more efficient than those acquiring only one. Naively, one might expect a bilingual-acquiring child to learn each language at half the rate of their monolingual peers, but in fact on a per-language basis, they are much faster. One possibility, which we are currently exploring, is that bilinguals are able to use what they have learned about each language to learn the other.

By elementary school, learning curves for grammar (A) are nearly indistinguishable for simultaneous bilinguals (N=30,397) and monolinguals (N=246,497; Hartshorne et al., 2018). By middle childhood, learning curves for vocabulary (B) are entirely indistinguishable for simultaneous bilinguals (N=4,207) and monolinguals (N=48,162) (Hua & Hartshorne, in prep). During initial learning, English vocabulary learning by bilinguals (N=646) lags monolinguals (N=11,040) as a function of age (C), but exceeds it as a function of input (D; Hua & Hartshorne, in prep).

In current and planned work, we are building out computational models of Conceptual Nativism. This is an ambitious task, as modeling how people learn to talk about their thoughts about the world requires good models of both thought and the world.

This work has been supported by NSF #2033938, #1606285, #1551834, an NSF CAREER grant (#2238912), and an NIH NRSA.

Scaling the Cognitive and Behavioral Sciences

Research in the communication and other cognitive/behavioral sciences is slow, because collecting data is slow. While statisticians routinely report that robust & reliable results require collecting orders of magnitude more data than is typical, researchers often struggle to collect even that much. We were one of a handful of research groups that pioneered collecting vast datasets through online games, quizzes, and citizen science. While many researchers want to do such studies, there is a significant barrier to entry.

Left: We asked cognitive scientists how they would collect data if they needed one thousand, ten thousand, or one hundred thousand subjects. Note that most of our studies involve at least ten thousand. Right: The vast majority of these scientists reported they wanted to conduct massive online studies (>10,000 subjects), but hardly any had. In contrast, while many had done smaller studies using Amazon Mechanical Turk or Prolific, a smaller number actually wanted to. Data: 322 cognitive scientists surveyed in 2022.

In order to address that barrier, we have developed free-and-open-source software (Pushkin) for running massive citizen science experiments in the cognitive and behavioral sciences. This work has been supported by NSF #2229631, #2318474, #2029637, #1551834.