Posts

Clippers 1/14: Adam Stiff on Discovery of Semantic Factors in Virtual Patient Dialogues

Discovery of Semantic Factors in Virtual Patient Dialogues

The NLP community has become fixated on very deep Transformer models for semantic classification tasks, but some research suggests these models are not well suited to tasks with a large label space or data scarcity issues, and their speed at inference time is still unacceptable for real-time uses such as dialogue systems. We adapt a simple one-layer recurrent model utilizing a multi-headed self-attention mechanism for a dialogue task with hundreds of labels in a long-tail distribution over a few thousand examples. We demonstrate significant improvements over a strong text CNN baseline on rare labels, by independently forcing the representations of each attention head through low-dimensional bottlenecks. This requires the model to learn efficient representations, thus discovering factors of the (syntacto-)semantics of the input space that generalize from frequent labels to rare labels. The resulting models lend themselves well to interpretation, and analysis shows clear clustering of representations that span labels in ways that align with human understanding of the semantics of the inputs.

Clippers 12/3: David King on BERT for Detecting Paraphrase Context Comparability

Existing resources for paraphrasing such as WordNet and the PPDB contain patterns for easily producing paraphrases but cannot fully take into account in which contexts those patterns are applied. However, words and phrases that are substitutable in one context may not be in another. In this work, we investigate whether BERT’s contextualized word embeddings can be used to predict whether a candidate paraphrase is acceptable by comparing the context of the paraphrase against the context where the paraphrase rule was extracted from. The setting for our investigation is automatically producing paraphrases for augmenting data in a question-answering dialogue system. We generate paraphrases by aligning known paraphrases, extracting patterns, and applying those patterns to new sentences to combat data sparsity. We show that BERT can be used to better identify paraphrases judged acceptable by humans. We use those paraphrases in our downstream dialogue system and show [hopefully] improved accuracy in identifying sparse labels.

Clippers 11/26: Cory Shain on dissociating syntactic and semantic processing / Evan Jaffe on coreference and incremental surprisal

Title: Status report: Dissociating syntactic and semantic processing with disentangled deep contextualized representations
Presenter: Cory Shain
Abstract: Psycholinguists and cognitive scientists have long hypothesized that building syntactic structures on the one hand and building meaning representations on the other may be supported by functionally distinct components of the human sentence processing system. This idea is typically studied in controlled settings, using stimuli designed to independently manipulate syntactic and semantic processing demands (e.g. using “syntactic” vs. “semantic” violations), a paradigm which suffers from poor ecological validity and an inability to quantify the degree to which an experimental manipulation truly disentangles syntax and semantics. In this study, we follow recent work in natural language processing in attempting to learn deep contextualized word representations that automatically disentangle syntactic and semantic dimensions, using multi-task adversarial learning to encourage/discourage syntactic or semantic content in each part of the representation space. In contrast to prior work in this domain, our system produces strictly incremental word-level representations in addition to utterance-level representations, enabling us to use it to study online incremental processing patterns. Early pilot results suggest that our model effectively disentangles syntax and semantics, paving the way for using its contextualized encodings to study behavioral and neural measures of human sentence processing in more naturalistic settings.

Title: Status report: Coreference Resolution Improves Incremental Surprisal Estimation
Presenter: Evan Jaffe
Abstract: Coreference is an attractive phenomenon to examine for memory-based processing effects, given its definition of linking current and past material in discourse to form useful representations of meaning. Memory decay is a neat explanation for distance-based processing effects, and there are results showing individuals with amnesia or Alzheimer’s have degraded usage of pronouns and referring expressions. However, prediction-based effects are also a popular topic in sentence processing, resulting in numerous studies using incremental surprisal to model human behavior. Previous work (Jaffe et al 2018) found a potential memory effect for a coreference-based predictor called MentionCount when regressed to human reading time data, but did not control for the possibility of coreference driving prediction effects. Two experiments are presented that show 1) the value of adding coreference resolution to an existing parser-based incremental surprisal estimate, and 2) still show a significant effect of MentionCount even when baseline surprisal includes coreference.

Clippers 11/19: Byung-Doh Oh on Incremental Sentence Processing

Modeling incremental sentence processing with relational graph convolutional networks

We present an incremental sentence processing model in which syntactic and semantic information influence each other in an interactive manner. To this end, a PCFG-based left-corner parser (van Schijndel et al. 2013) has previously been extended to incorporate the semantic dependency predicate context (i.e. pair; Levy & Goldberg, 2014) associated with each node in the tree. In order to further improve the performance and generalizability of this model, dense representations of semantic predicate contexts and syntactic categories are learned and utilized as features for making left-corner parsing decisions. More specifically, a relational graph convolutional network (RGCN; Schlichtkrull et al. 2018) is trained to learn representations for predicates, as well as role functions for cuing the representation associated with each of its arguments. In addition, syntactic category embeddings are learned together with the left-corner parsing sub-models to minimize cross-entropy loss. Ultimately, the goal of the model is to provide a measure of predictability that is sensitive to semantic context, which in turn will serve as a baseline for testing claims about the nature of human sentence processing.

Clippers 11/12: Mounica Maddela on Hashtag Segmentation

Multi-task Pairwise Neural Ranking for Hashtag Segmentation

Mounica Maddela

Hashtags are often employed on social media with the goal of increasing discoverability, aiding search, or providing additional semantics. However, the semantic content of hashtags is not straightforward to infer as these represent ad-hoc conventions which frequently include multiple words joined together and can include abbreviations and unorthodox spellings. We build a dataset of 12,594 hashtags split into individual segments and propose a set of approaches for hashtag segmentation by framing it as a pairwise ranking problem between candidate segmentations. Our novel neural approaches demonstrate a 24.6% error reduction in hashtag segmentation accuracy compared to the current state-of-the-art method. Finally, we demonstrate that a deeper understanding of hashtag semantics obtained through segmentation is useful for downstream applications such as sentiment analysis, for which we achieved a 2.6% increase in average recall on the SemEval 2017 sentiment analysis dataset.

Clippers 10/29: Nanjiang Jiang, Evaluating BERT for natural language inference: A case study on the CommitmentBank

Evaluating BERT for natural language inference: A case study on the CommitmentBank

Nanjiang Jiang and Marie-Catherine de Marneffe

Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without ob- serving the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complements of the clause-embedding verb in each premise, lead- ing to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.

Clippers 10/22: Ilana Heintz on multimodal processing and cross-lingual relation extraction at BBN

Multimodal processing and cross-lingual relation extraction at BBN

I will show the architecture of a system we have built to process visual, audio, and text information in parallel to support hypothesis generation. Then I will talk about a specific research thrust into relation extraction, a text-based technology, using BERT embeddings and annotation projection to perform relation extraction in Russian and Ukrainian.

Clippers 10/1: Peter Plantinga on Mispronunciation Detection for Kids’ Speech

Real-time Mispronunciation Detection for Kids’ Speech

Modern mispronunciation detection and diagnosis systems have seen significant gains in accuracy due to the introduction of deep learning. However, these systems have not been evaluated for the ability to be run in real-time, an important factor in applications that provide rapid feedback. In particular, the state-of-the-art uses bi-directional recurrent networks, where a uni-directional network may be more appropriate. Teacher-student learning is a natural approach to improve a uni-directional model, but when using a CTC objective, this is limited by poor alignment of outputs to evidence. We address this limitation by trying two loss terms for improving the alignments of our models. One loss is an “alignment loss” term that encourages outputs only when features do not resemble silence. The other loss term uses a uni-directional model as teacher model to align the bi-directional model. Our proposed model uses these aligned bi-directional models as teacher models. Experiments on the CSLU kids’ corpus show that these changes decrease the latency of the outputs, and improve the detection rates, with a trade-off between these goals.

Clippers 9/24: Marie de Marneffe on Speaker Commitment

Do you know that there’s still a chance? Identifying speaker commitment for natural language understanding

Marie-Catherine de Marneffe

When we communicate, we infer a lot beyond the literal meaning of the words we hear or read. In particular, our understanding of an utterance depends on assessing the extent to which the speaker stands by the event she describes. An unadorned declarative like “The cancer has spread” conveys firm speaker commitment of the cancer having spread, whereas “There are some indicators that the cancer has spread” imbues the claim with uncertainty. It is not only the absence vs. presence of embedding material that determines whether or not a speaker is committed to the event described: from (1) we will infer that the speaker is committed to there being war, whereas in (2) we will infer the speaker is committed to relocating species not being a panacea, even though the clauses that describe the events in (1) and (2) are both embedded under “(s)he doesn’t believe”.

(1) The problem, I’m afraid, with my colleague here, he really doesn’t believe that it’s war.

(2) Transplanting an ecosystem can be risky, as history shows. Hellmann doesn’t believe that relocating species threatened by climate change is a panacea.

In this talk, I will first illustrate how looking at pragmatic information of what speakers are committed to can improve NLP applications. Previous work has tried to predict the outcome of contests (such as the Oscars or elections) from tweets. I will show that by distinguishing tweets that convey firm speaker commitment toward a given outcome (e.g., “Dunkirk will win Best Picture in 2018”) from ones that only suggest the outcome (e.g., “Dunkirk might have a shot at the 2018 Oscars”) or tweets that convey the negation of the event (“Dunkirk is good but not academy level good for the Oscars”), we can outperform previous methods. Second, I will evaluate current models of speaker commitment, using the CommitmentBank, a dataset of naturally occurring discourses developed to deepen our understanding of the factors at play in identifying speaker commitment. We found that a linguistically informed model outperforms a LSTM-based one, suggesting that linguistic knowledge is needed to achieve robust language understanding. Both models however fail to generalize to the diverse linguistic constructions present in natural language, highlighting directions for improvement.

Clippers 9/10: Michael White on Constrained Decoding in Neural NLG

Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue

(joint work with Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani and Rajen Subba)

Neural methods for natural language generation (NNLG) arrived with much fanfare a few years ago and became the dominant method employed in the recent E2E NLG Challenge. While neural methods promise flexible, end-to-end trainable models, recent studies have revealed their inability to produce satisfactory output for longer or more complex texts as well as how the black-box nature of these models makes them difficult to control. In this talk, I will propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning. I will then introduce a constrained decoding approach for sequence-to-sequence models that leverages this representation to improve semantic correctness. Finally, I will demonstrate promising results on a new conversational weather dataset as well as the E2E dataset and discuss remaining challenges.