Clippers 1/28: Denis Newman Griffis on A Typology of Ambiguity in Medical Concept Normalization Datasets

Title: A typology of ambiguity in medical concept normalization datasets

Medical concept normalization (MCN; also called biomedical word sense disambiguation) is the task of assigning unique concept identifiers (CUIs) to mentions of biomedical concepts. Several MCN datasets focusing on Electronic Health Record (EHR) data have been developed over the past decade, and while several challenges due to conceptual ambiguity have been identified in methodological research, the types of lexical ambiguity exhibited by clinical MCN datasets has not been systematically studied. I will present preliminary results of an ongoing analysis of benchmark clinical MCN datasets, describing an initial, domain-specific typology of lexical ambiguity in MCN annotations. I will also discuss desiderata for future MCN research aimed at addressing these challenges in both methods and evaluation.

Clippers 1/21: Alex Erdmann on Unsupervised Morphology

Lexica distinguishing all morphologically related forms of each lexeme are crucial to many downstream technologies, yet building them is expensive. We propose a frugal paradigm completion approach that predicts all related forms in a morphological paradigm from as few manually provided forms as possible. It induces typological information during training which it uses to determine the best sources at test time. We evaluate our language-agnostic approach on 7 diverse languages. Compared to popular alternative approaches, ours reduces manual labor by 16-63% and is the most robust to typological variation.

Clippers 1/14: Adam Stiff on Discovery of Semantic Factors in Virtual Patient Dialogues

Discovery of Semantic Factors in Virtual Patient Dialogues

The NLP community has become fixated on very deep Transformer models for semantic classification tasks, but some research suggests these models are not well suited to tasks with a large label space or data scarcity issues, and their speed at inference time is still unacceptable for real-time uses such as dialogue systems. We adapt a simple one-layer recurrent model utilizing a multi-headed self-attention mechanism for a dialogue task with hundreds of labels in a long-tail distribution over a few thousand examples. We demonstrate significant improvements over a strong text CNN baseline on rare labels, by independently forcing the representations of each attention head through low-dimensional bottlenecks. This requires the model to learn efficient representations, thus discovering factors of the (syntacto-)semantics of the input space that generalize from frequent labels to rare labels. The resulting models lend themselves well to interpretation, and analysis shows clear clustering of representations that span labels in ways that align with human understanding of the semantics of the inputs.

Clippers 12/3: David King on BERT for Detecting Paraphrase Context Comparability

Existing resources for paraphrasing such as WordNet and the PPDB contain patterns for easily producing paraphrases but cannot fully take into account in which contexts those patterns are applied. However, words and phrases that are substitutable in one context may not be in another. In this work, we investigate whether BERT’s contextualized word embeddings can be used to predict whether a candidate paraphrase is acceptable by comparing the context of the paraphrase against the context where the paraphrase rule was extracted from. The setting for our investigation is automatically producing paraphrases for augmenting data in a question-answering dialogue system. We generate paraphrases by aligning known paraphrases, extracting patterns, and applying those patterns to new sentences to combat data sparsity. We show that BERT can be used to better identify paraphrases judged acceptable by humans. We use those paraphrases in our downstream dialogue system and show [hopefully] improved accuracy in identifying sparse labels.

Clippers 11/26: Cory Shain on dissociating syntactic and semantic processing / Evan Jaffe on coreference and incremental surprisal

Title: Status report: Dissociating syntactic and semantic processing with disentangled deep contextualized representations
Presenter: Cory Shain
Abstract: Psycholinguists and cognitive scientists have long hypothesized that building syntactic structures on the one hand and building meaning representations on the other may be supported by functionally distinct components of the human sentence processing system. This idea is typically studied in controlled settings, using stimuli designed to independently manipulate syntactic and semantic processing demands (e.g. using “syntactic” vs. “semantic” violations), a paradigm which suffers from poor ecological validity and an inability to quantify the degree to which an experimental manipulation truly disentangles syntax and semantics. In this study, we follow recent work in natural language processing in attempting to learn deep contextualized word representations that automatically disentangle syntactic and semantic dimensions, using multi-task adversarial learning to encourage/discourage syntactic or semantic content in each part of the representation space. In contrast to prior work in this domain, our system produces strictly incremental word-level representations in addition to utterance-level representations, enabling us to use it to study online incremental processing patterns. Early pilot results suggest that our model effectively disentangles syntax and semantics, paving the way for using its contextualized encodings to study behavioral and neural measures of human sentence processing in more naturalistic settings.

Title: Status report: Coreference Resolution Improves Incremental Surprisal Estimation
Presenter: Evan Jaffe
Abstract: Coreference is an attractive phenomenon to examine for memory-based processing effects, given its definition of linking current and past material in discourse to form useful representations of meaning. Memory decay is a neat explanation for distance-based processing effects, and there are results showing individuals with amnesia or Alzheimer’s have degraded usage of pronouns and referring expressions. However, prediction-based effects are also a popular topic in sentence processing, resulting in numerous studies using incremental surprisal to model human behavior. Previous work (Jaffe et al 2018) found a potential memory effect for a coreference-based predictor called MentionCount when regressed to human reading time data, but did not control for the possibility of coreference driving prediction effects. Two experiments are presented that show 1) the value of adding coreference resolution to an existing parser-based incremental surprisal estimate, and 2) still show a significant effect of MentionCount even when baseline surprisal includes coreference.

Clippers 11/19: Byung-Doh Oh on Incremental Sentence Processing

Modeling incremental sentence processing with relational graph convolutional networks

We present an incremental sentence processing model in which syntactic and semantic information influence each other in an interactive manner. To this end, a PCFG-based left-corner parser (van Schijndel et al. 2013) has previously been extended to incorporate the semantic dependency predicate context (i.e. pair; Levy & Goldberg, 2014) associated with each node in the tree. In order to further improve the performance and generalizability of this model, dense representations of semantic predicate contexts and syntactic categories are learned and utilized as features for making left-corner parsing decisions. More specifically, a relational graph convolutional network (RGCN; Schlichtkrull et al. 2018) is trained to learn representations for predicates, as well as role functions for cuing the representation associated with each of its arguments. In addition, syntactic category embeddings are learned together with the left-corner parsing sub-models to minimize cross-entropy loss. Ultimately, the goal of the model is to provide a measure of predictability that is sensitive to semantic context, which in turn will serve as a baseline for testing claims about the nature of human sentence processing.

Clippers 11/12: Mounica Maddela on Hashtag Segmentation

Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Mounica Maddela

Hashtags are often employed on social media with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate a 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.

Clippers 10/29: Nanjiang Jiang, Evaluating BERT for natural language inference: A case study on the CommitmentBank

Evaluating BERT for natural language inference: A case study on the CommitmentBank

Nanjiang Jiang and Marie-Catherine de Marneffe

Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without ob- serving the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, lead- ing to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.

Clippers 10/22: Ilana Heintz on multimodal processing and cross-lingual relation extraction at BBN

Multimodal processing and cross-lingual relation extraction at BBN

I will show the architecture of a system we have built to process visual, audio, and text information in parallel to support hypothesis generation. Then I will talk about a specific research thrust into relation extraction, a text-based technology, using BERT embeddings and annotation projection to perform relation extraction in Russian and Ukrainian.

Clippers 10/1: Peter Plantinga on Mispronunciation Detection for Kids’ Speech

Real-time Mispronunciation Detection for Kids’ Speech

Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network may be more appropriate. Teacher-student learning is a natural approach to improve a uni-directional model, but when using a CTC objective, this is limited by poor alignment of outputs to evidence. We address this limitation by trying two loss terms for improving the alignments of our models. One loss is an “alignment loss” term that encourages outputs only when features do not resemble silence. The other loss term uses a uni-directional model as teacher model to align the bi-directional model. Our proposed model uses these aligned bi-directional models as teacher models. Experiments on the CSLU kids’ corpus show that these changes decrease the latency of the outputs, and improve the detection rates, with a trade-off between these goals.