Clippers 9/19: Byung-Doh Oh on the bigger-is-worse effect of LLM surprisal

A feature attribution analysis of the bigger-is-worse effect of large language model surprisal

Byung-Doh Oh, William Schuler

Recent studies have consistently shown that surprisal estimates from ‘bigger’ large language model (LLM) variants with more parameters and lower perplexity are less predictive of comprehension difficulty that manifests in human reading times, which highlights a fundamental mismatch between the mechanistic processes underlying LLMs and human sentence processing. This work will present preliminary results from a feature attribution analysis that sheds light on such systematic divergence of LLMs by examining how different variants leverage identical context tokens, including observations that 1) perturbation-based feature attribution methods and 2) feature interactions over multiple tokens may be more appropriate for examining bigger LLM variants.