Clippers 9/26: Christian Clark on categorial grammar induction

Toward Categorial Grammar Induction Using Predicate Co-occurrences from RoBERTa

Recent experiments with large language models (LLMs) have produced tantalizing
evidence that innate knowledge is not needed to acquire language. Even so, LLMs do not directly reveal what categories and rules are learned, limiting their utility in explaining human language acquisition. Grammar induction models, in contrast, provide a more explicit means of exploring questions about learnability. Recent work has achieved advances in unsupervised induction of probabilistic context-free grammars (PCFGs). However, categorial grammar induction has received less recent attention, despite its appealing properties such as a transparent syntax–semantics interface. Motivated by this, I will present a set of experiments using a new model that induces a basic categorial grammar. I will also describe some first steps toward an extension to the model that will incorporate predicate co-occurrence information extracted from RoBERTa, as a means of leveraging world knowledge from an LLM within a model that learns explicit rules. I am especially interested in hearing the group’s suggestions for this ongoing work.