BERT is state-of-the-art for event factuality, but still fails on pragmatics
Event factuality prediction is the task of predicting whether an event described in the text is factual or not. It is a complex semantic phenomenon that is important for various NLP downstream tasks e.g. information extraction. For example, in Trump thinks he knows better than the doctors about coronavirus, it is crucial that an information extraction system can identify that Trump knows better than the doctors about coronavirus is nonfactual. Although BERT has boosted the performance of various natural language understanding tasks, its applications to event factuality has been limited to the set-up of natural language inference. In this paper, we investigate how well BERT performs on seven event factuality datasets. We found that although BERT can obtain the new state-of-the-art performance on four existing datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, while fails on instances where pragmatic reasoning overrides. Unlike the high performance suggests, we are still far away from having a robust system for event factuality prediction.
Models of human sentence processing effort tend to focus on costs
associated with retrieving structures and discourse referents from
memory (memory-based) and/or on costs associated with anticipating
upcoming words and structures based on contextual cues
Although evidence suggests that expectation and memory may play
separable roles in language comprehension (Levy et al 2013), theories of
coreference processing have largely focused on memory: how comprehenders
identify likely referents of linguistic expressions.
In this study, we hypothesize that coreference tracking also informs
human expectations about upcoming words, and we test this hypothesis by
evaluating the degree to which incremental surprisal measures generated
by a novel coreference-aware semantic parser explain human response
times in a naturalistic self-paced reading experiment.
Results indicate (1) that coreference information indeed guides human
expectations and (2) that coreference effects on memory retrieval exist
independently of coreference effects on expectations.
Together, these findings suggest that the language processing system
exploits coreference information both to retrieve referents from memory
and to anticipate upcoming material.
Modeling incremental sentence processing with relational graph convolutional networks
We present an incremental model of sentence processing in which syntactic and semantic information influence each other in an interactive manner. To this end, a PCFG-based left-corner parser (van Schijndel et al. 2013) has previously been extended to incorporate the semantic dependency predicate context (i.e. pair; Levy & Goldberg, 2014) associated with each node in the parse tree. In order to further improve the accuracy and generalizability of this model, dense representations of semantic predicate contexts and syntactic categories are learned and utilized as features for making parsing decisions. More specifically, a relational graph convolutional network (RGCN; Schlichtkrull et al. 2018) is trained to learn representations for predicates, as well as role functions for cuing the representation associated with each of its arguments. In addition, syntactic category embeddings are learned jointly with the parsing sub-models to minimize cross-entropy loss. Ultimately, the goal of the model is to provide a measure of predictability that is sensitive to semantic context, which in turn will serve as a baseline for testing claims about the nature of human sentence processing.
Abstract: In this talk, I will present our paper accepted in AAAI 2020. We conduct a data-driven study focusing on analyzing and predicting sentence deletion — a prevalent but understudied phenomenon in document level Text Simplification on a large English text simplification corpus. We inspect various discourse-level factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet the readability standards of elementary and middle schools. To predict whether a sentence will be deleted during simplification to a certain level, we harness automatically aligned data to train a classification model. We find that discourse-level factors contribute to the challenging task of predicting sentence deletion for simplification.
Bio: Yang Zhong is a first-year Ph.D. student in the Department of Computer Science and Engineering, advised by Prof. Wei Xu. His research mainly focuses on the stylistic variation of language, as well as in the field of document level text simplification.
Linguistic Marker Discovery with BERT
Detecting politeness in text is a task that has attracted attention in recent years due to its role in identifying abusive language. Previous work have either used feature-based models or deep neural networks for this task. Due to the lack of context, feature-based models perform significantly worse compared to modern deep-learning models. We leverage pretrained Bert representations to provide clustering of words based on their context. We show how we are able to obtain interpretable contextualized features that can help reduce the gap in performance between feature-based models and deep learning approaches.
What can computational methods do for sociolinguistics?
This talk provides a brief overview of computational sociolinguistics, an emerging field with the twin goals of improving NLP systems using sociolinguistics and of answering sociolinguistic questions using NLP and other computational methods. I briefly discuss what sociolinguistics can do for NLP, then turn to what NLP/computational methods can do for sociolinguistics, using two examples from my research: (1) using SVMs for word sense disambiguation on Twitter data to compare regional variation in African American versus white US English, and (2) using hierarchical cluster analysis to study individual differences in patterns of social meaning. Finally, I discuss future directions for computational sociolinguistics.
Title: A typology of ambiguity in medical concept normalization datasets
Medical concept normalization (MCN; also called biomedical word sense disambiguation) is the task of assigning unique concept identifiers (CUIs) to mentions of biomedical concepts. Several MCN datasets focusing on Electronic Health Record (EHR) data have been developed over the past decade, and while several challenges due to conceptual ambiguity have been identified in methodological research, the types of lexical ambiguity exhibited by clinical MCN datasets has not been systematically studied. I will present preliminary results of an ongoing analysis of benchmark clinical MCN datasets, describing an initial, domain-specific typology of lexical ambiguity in MCN annotations. I will also discuss desiderata for future MCN research aimed at addressing these challenges in both methods and evaluation.
Lexica distinguishing all morphologically related forms of each lexeme are crucial to many downstream technologies, yet building them is expensive. We propose a frugal paradigm completion approach that predicts all related forms in a morphological paradigm from as few manually provided forms as possible. It induces typological information during training which it uses to determine the best sources at test time. We evaluate our language-agnostic approach on 7 diverse languages. Compared to popular alternative approaches, ours reduces manual labor by 16-63% and is the most robust to typological variation.
Discovery of Semantic Factors in Virtual Patient Dialogues
The NLP community has become fixated on very deep Transformer models for semantic classification tasks, but some research suggests these models are not well suited to tasks with a large label space or data scarcity issues, and their speed at inference time is still unacceptable for real-time uses such as dialogue systems. We adapt a simple one-layer recurrent model utilizing a multi-headed self-attention mechanism for a dialogue task with hundreds of labels in a long-tail distribution over a few thousand examples. We demonstrate significant improvements over a strong text CNN baseline on rare labels, by independently forcing the representations of each attention head through low-dimensional bottlenecks. This requires the model to learn efficient representations, thus discovering factors of the (syntacto-)semantics of the input space that generalize from frequent labels to rare labels. The resulting models lend themselves well to interpretation, and analysis shows clear clustering of representations that span labels in ways that align with human understanding of the semantics of the inputs.
Existing resources for paraphrasing such as WordNet and the PPDB contain patterns for easily producing paraphrases but cannot fully take into account in which contexts those patterns are applied. However, words and phrases that are substitutable in one context may not be in another. In this work, we investigate whether BERT’s contextualized word embeddings can be used to predict whether a candidate paraphrase is acceptable by comparing the context of the paraphrase against the context where the paraphrase rule was extracted from. The setting for our investigation is automatically producing paraphrases for augmenting data in a question-answering dialogue system. We generate paraphrases by aligning known paraphrases, extracting patterns, and applying those patterns to new sentences to combat data sparsity. We show that BERT can be used to better identify paraphrases judged acceptable by humans. We use those paraphrases in our downstream dialogue system and show [hopefully] improved accuracy in identifying sparse labels.