BERT is state-of-the-art for event factuality, but still fails on pragmatics
Event factuality prediction is the task of predicting whether an event described in the text is factual or not. It is a complex semantic phenomenon that is important for various NLP downstream tasks e.g. information extraction. For example, in Trump thinks he knows better than the doctors about coronavirus, it is crucial that an information extraction system can identify that Trump knows better than the doctors about coronavirus is nonfactual. Although BERT has boosted the performance of various natural language understanding tasks, its applications to event factuality has been limited to the set-up of natural language inference. In this paper, we investigate how well BERT performs on seven event factuality datasets. We found that although BERT can obtain the new state-of-the-art performance on four existing datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, while fails on instances where pragmatic reasoning overrides. Unlike the high performance suggests, we are still far away from having a robust system for event factuality prediction.