Clippers 10/11: Willy Cheung on Targeted Linguistic Evaluation of Cataphora

Due to their state of the art performance on natural language processing tasks, large neural language models have garnered significant interest as of late. To get a better understanding of their linguistic abilities, linguistics researchers have used the targeted linguistic evaluation paradigm to test neural models in a more linguistically controlled manner. Following this line of work, I am interested in investigating how neural models handle cataphora, i.e. when a pronoun precedes what it refers to (e.g. when [he] gets to work, [John] likes to drink a cup of coffee). I will present work attempting to use stimuli from existing cataphora studies, running and comparing GPT2 results to experimental data. A number of issues arise in comparing to existing studies, motivating a new study to collect data that would better suit the testing of neural models. I show the set up for my pilot experiment, and some preliminary results. I end with some ideas for future directions of this work.