Projects

Language acquisition

I am interested in computational modeling of infant first-language acquisition, particularly phonetics, phonology and morphology. My recent work includes lab phonetic analysis of child input and child production data and computational models of acquisition (in particular neural networks) with Naomi Feldman. (See SCiL-19, BUCLD-19, BUCLD-15 for data analysis, NAACL-19 and EMNLP-17 for neural net models, Interspeech-17 for a forced-aligned corpus of English child-directed speech.)

Here’s a recent talk (OSU Eye and Ear Institute, 2019) [PDF]

Language and vision

In collaboration with Alasdair Clarke and Hannah Rohde, I am looking at how visual perception affects production and perception of referring expressions. To do so, we have collected and analyzing a corpus of descriptions of people in Where’s Wally images (he’s called Waldo in the USA!). (See CogSci-17, Frontiers-13 and slides for TTI Speech and Language Day 2013. WREC dataset is available for download.)

Here’s a recent talk (Cornell, 2019) [PDF]

Inflectional morphology

I’m working on the implicational structure of morphological paradigms with Andrea Sims. We are focusing on the ways in which morphological systems can be internally predictable along many different dimensions. Even when different words take very different morphological markers, they can still be similar in the abstract arrangement of the markers, and this can aid humans or computer systems in learning how to inflect. (See JLM-19 for a survey, this 2021 talk given at Georgia Tech for some project details.)

Local discourse coherence and chat disentanglement

My thesis work focused on discourse coherence: the way a document or conversation is structured to provide context for new information. I constructed models looking at where and how entities (things in the world) are mentioned in a text. I also showed that these models can be used to disentangle the different threads of conversation going on in a crowded chat room. (See ACL 11 a and b, CL 10, ACL 08 a and b, NAACL 07, coherence toolkit software and Cambridge 2012 talk.)