Clippers 9/26: Christian Clark on categorial grammar induction

Toward Categorial Grammar Induction Using Predicate Co-occurrences from RoBERTa

Recent experiments with large language models (LLMs) have produced tantalizing
evidence that innate knowledge is not needed to acquire language. Even so, LLMs do not directly reveal what categories and rules are learned, limiting their utility in explaining human language acquisition. Grammar induction models, in contrast, provide a more explicit means of exploring questions about learnability. Recent work has achieved advances in unsupervised induction of probabilistic context-free grammars (PCFGs). However, categorial grammar induction has received less recent attention, despite its appealing properties such as a transparent syntax–semantics interface. Motivated by this, I will present a set of experiments using a new model that induces a basic categorial grammar. I will also describe some first steps toward an extension to the model that will incorporate predicate co-occurrence information extracted from RoBERTa, as a means of leveraging world knowledge from an LLM within a model that learns explicit rules. I am especially interested in hearing the group’s suggestions for this ongoing work.

Clippers 9/19: Byung-Doh Oh on the bigger-is-worse effect of LLM surprisal

A feature attribution analysis of the bigger-is-worse effect of large language model surprisal

Byung-Doh Oh, William Schuler

Recent studies have consistently shown that surprisal estimates from ‘bigger’ large language model (LLM) variants with more parameters and lower perplexity are less predictive of comprehension difficulty that manifests in human reading times, which highlights a fundamental mismatch between the mechanistic processes underlying LLMs and human sentence processing. This work will present preliminary results from a feature attribution analysis that sheds light on such systematic divergence of LLMs by examining how different variants leverage identical context tokens, including observations that 1) perturbation-based feature attribution methods and 2) feature interactions over multiple tokens may be more appropriate for examining bigger LLM variants.

Clippers 9/5: Michael White on Bootstrapping a Conversational Guide for Colonoscopy Prep (Arya et al., SIGDIAL-23)

Pulkit Arya, Madeleine Bloomquist, Subhankar Chakraborty, Andrew Perrault, William Schuler, Eric Fosler-Lussier, and Michael White. 2023. Bootstrapping a Conversational Guide for Colonoscopy Prep. To appear in Proc. SIGDIAL-23.

Creating conversational systems for niche domains is a challenging task, further exacerbated by a lack of quality datasets. We explore the construction of safer conversational systems for guiding patients in preparing for colonoscopies. This has required a data generation pipeline to generate a minimum viable dataset to bootstrap a semantic parser, augmented by automatic paraphrasing. Our study suggests large language models (e.g., GPT-3.5 & GPT-4) are a viable alternative to crowd sourced paraphrasing, but conversational systems that rely upon language models’ ability to do temporal reasoning struggle to provide accurate responses. A neural-symbolic system that performs temporal reasoning on an intermediate representation of user queries shows promising results compared to an end-to-end dialogue system, improving the number of correct responses while vastly reducing the number of incorrect or misleading ones.