Clippers 01/29 Nanjiang Jiang on evaluating models of speaker commitment

Evaluating state-of-the-art models of speaker commitment

When a speaker, Mary, utters “John did not discover that Bill lied”, we take Mary to be committed to Bill having lied, whereas in “John didn’t say that Bill lied”, we do not take that she is. Extracting such inferences arising from speaker commitment (aka event factuality) is crucial for information extraction and question answering. In this talk, we evaluate the state-of-the-art models for speaker commitment and natural language inference on the CommitmentBank, an English dataset of naturally occurring discourses, annotated with speaker commitment towards the content of the complement (“lied” in the example) of clause-embedding verbs (“discover”, “say”) under four entailment-canceling environment (negation, conditional, question, and modal). The CommitmentBank thus focuses on specific linguistic constructions and can be viewed as containing “adversarial” examples for speaker commitment models. We perform a detailed error analysis of the models’ outputs by breaking down items into classes according to various linguistic features. We show that these models can achieve good performance on certain classes of items, but fail to generalize to the diverse linguistic constructions that are present in natural language, highlighting directions for improvement.