Abstract:
Recent psycholinguistic research has compared human reading times to surprisal estimates from language models to study the factors shaping human sentence processing difficulty. Previous studies have shown that surprisal values from Transformer models align with reading times better than those from alternative models such as RNNs. However, standard Transformers include a lossless representation of the entire previous linguistic context, a feature which makes them somewhat implausible as models of human cognition. To address this limitation, I test a transformer variant which includes ALiBi, a recency bias added to attention scores. Surprisal estimates with ALiBi show an improved fit to human reading times compared to a standard Transformer baseline.