Clippers 3/19: Christian Clark on semantically aided categorial grammar induction

Studies of grammar induction are a source of evidence about the mechanisms underlying children’s language acquisition. Manipulating the prior knowledge and inductive biases of grammar inducers can yield insights about the learnability of syntactic structure under various assumptions about the learner. While early induction models often relied on annotated data, more recent models have made progress toward learning from raw data, working with both probabilistic context-free grammars and categorial grammars. Still, accuracy levels of current systems fall well below human learners.

Incorporating world knowledge into grammar inducers is a potential path toward further improvement, one which is well motivated by psycholinguistic theory (e.g. semantic bootstrapping). Along these lines, I will present a categorial grammar inducer that incorporates semantic knowledge — implemented as association weights between predicate roles — into an existing syntax-only inducer. Associations can be distilled from large language models (LLMs), opening up possibilities not only for better grammar induction but also for exploration of the conceptual knowledge acquired by LLMs. This project is still a work in progress, but I will present some preliminary results on synthetic data and broad-coverage corpora.