Do you know that there’s still a chance? Identifying speaker commitment for natural language understanding
Marie-Catherine de Marneffe
When we communicate, we infer a lot beyond the literal meaning of the words we hear or read. In particular, our understanding of an utterance depends on assessing the extent to which the speaker stands by the event she describes. An unadorned declarative like “The cancer has spread” conveys firm speaker commitment of the cancer having spread, whereas “There are some indicators that the cancer has spread” imbues the claim with uncertainty. It is not only the absence vs. presence of embedding material that determines whether or not a speaker is committed to the event described: from (1) we will infer that the speaker is committed to there being war, whereas in (2) we will infer the speaker is committed to relocating species not being a panacea, even though the clauses that describe the events in (1) and (2) are both embedded under “(s)he doesn’t believe”.
(1) The problem, I’m afraid, with my colleague here, he really doesn’t believe that it’s war.
(2) Transplanting an ecosystem can be risky, as history shows. Hellmann doesn’t believe that relocating species threatened by climate change is a panacea.
In this talk, I will first illustrate how looking at pragmatic information of what speakers are committed to can improve NLP applications. Previous work has tried to predict the outcome of contests (such as the Oscars or elections) from tweets. I will show that by distinguishing tweets that convey firm speaker commitment toward a given outcome (e.g., “Dunkirk will win Best Picture in 2018”) from ones that only suggest the outcome (e.g., “Dunkirk might have a shot at the 2018 Oscars”) or tweets that convey the negation of the event (“Dunkirk is good but not academy level good for the Oscars”), we can outperform previous methods. Second, I will evaluate current models of speaker commitment, using the CommitmentBank, a dataset of naturally occurring discourses developed to deepen our understanding of the factors at play in identifying speaker commitment. We found that a linguistically informed model outperforms a LSTM-based one, suggesting that linguistic knowledge is needed to achieve robust language understanding. Both models however fail to generalize to the diverse linguistic constructions present in natural language, highlighting directions for improvement.
Constrained Decoding for Neural NLG from Compositional Representations in Task-Oriented Dialogue
(joint work with Anusha Balakrishnan, Jinfeng Rao, Kartikeya Upasani and Rajen Subba)
Neural methods for natural language generation (NNLG) arrived with much fanfare a few years ago and became the dominant method employed in the recent E2E NLG Challenge. While neural methods promise flexible, end-to-end trainable models, recent studies have revealed their inability to produce satisfactory output for longer or more complex texts as well as how the black-box nature of these models makes them difficult to control. In this talk, I will propose using tree-structured semantic representations, like those used in traditional rule-based NLG systems, for better discourse-level structuring and sentence-level planning. I will then introduce a constrained decoding approach for sequence-to-sequence models that leverages this representation to improve semantic correctness. Finally, I will demonstrate promising results on a new conversational weather dataset as well as the E2E dataset and discuss remaining challenges.
Title: fMRI reveals language-specific predictive coding during naturalistic sentence comprehension
Abstract: Much research in cognitive neuroscience supports prediction as a canonical computation of cognition in many domains. Is such predictive coding implemented by feedback from higher-order domain-general circuits, or is it locally implemented in domain-specific circuits? What information sources are used to generate these predictions? This study addresses these two questions in the context of language processing. We present fMRI evidence from a naturalistic comprehension paradigm (1) that predictive coding in the brain’s response to language is domain-specific, and (2) that these predictions are sensitive both to local word co-occurrence patterns and to hierarchical structure. Using a recently developed continuous-time deconvolutional regression technique that supports data-driven hemodynamic response function discovery from continuous BOLD signal fluctuations in response to naturalistic stimuli, we found we found effects of prediction measures in the language network but not in the domain-general, multiple-demand network. Moreover, within the language network, surface-level and structural prediction effects were separable. The predictability effects in the language network were substantial, with the model capturing over 37% of explainable variance on held-out data. These findings indicate that human sentence processing mechanisms generate predictions about upcoming words using cognitive processes that are sensitive to hierarchical structure and specialized for language processing, rather than via feedback from high-level executive control mechanisms.
We demonstrate a natural language understanding module for a question-answering dialog agent in a resource-constrained virtual patient domain, which combines both rule-based and machine learning approaches. We further validate the model development work by performing a replication study using live subjects, broadly confirming the findings from the development process using a fixed dataset, but highlighting important deficits. In particular, the hybrid approach continues to show substantial improvements over either rule-based or machine learning approaches individually, even handling unseen classes with some success; however, the system has unexpected difficulty handling out-of-domain questions. We attempt to mitigate this issue with moderate success, and provide analysis of the problem to suggest future improvements.
Learning to disambiguate by combining multiple sense representations
This talk will discuss ongoing work investigating the combination of multiple sense representation methods for word sense disambiguation (WSD). A variety of recent methods have been proposed for learning representations of semantic senses in different domains, and there is some evidence that different methods capture complementary information for WSD. We consider a simple but competitive cosine similarity-based model for WSD, and augment it by learning to produce a context-sensitive linear transformation of representations of candidate senses. In addition to transforming the input sense space, our method allows us to jointly project multiple sense representations into a single space. We find that a single learned projection matches or outperforms directly updated sense embeddings for single embedding methods, and demonstrate that combining multiple representations improves over any individual method alone. Further, by transforming and conjoining complete embedding spaces, we gain the ability to transfer model knowledge to ambiguous terms not seen during training; we are currently investigating the effectiveness of this transfer.
Evaluating state-of-the-art models of speaker commitment
When a speaker, Mary, utters “John did not discover that Bill lied”, we take Mary to be committed to Bill having lied, whereas in “John didn’t say that Bill lied”, we do not take that she is. Extracting such inferences arising from speaker commitment (aka event factuality) is crucial for information extraction and question answering. In this talk, we evaluate the state-of-the-art models for speaker commitment and natural language inference on the CommitmentBank, an English dataset of naturally occurring discourses, annotated with speaker commitment towards the content of the complement (“lied” in the example) of clause-embedding verbs (“discover”, “say”) under four entailment-canceling environment (negation, conditional, question, and modal). The CommitmentBank thus focuses on specific linguistic constructions and can be viewed as containing “adversarial” examples for speaker commitment models. We perform a detailed error analysis of the models’ outputs by breaking down items into classes according to various linguistic features. We show that these models can achieve good performance on certain classes of items, but fail to generalize to the diverse linguistic constructions that are present in natural language, highlighting directions for improvement.
Prediction is All You Need: A Large-Scale Study of the Effects of Word Frequency and Predictability in Naturalistic Reading
A number of psycholinguistic studies have factorially manipulated words’ contextual predictabilities and corpus frequencies and shown separable effects of each on measures of human sentence processing, a pattern which has been used to support distinct processing effects of prediction on the one hand and strength of memory representation on the other. This paper examines the generalizability of this finding to more realistic conditions of sentence processing by studying effects of frequency and predictability in three large-scale naturalistic reading corpora. Results show significant effects of word frequency and predictability in isolation but no effect of frequency over and above predictability, and thus do not provide evidence of distinct effects. The non-replication of separable effects in a naturalistic setting raises doubts about the existence of such a distinction in everyday sentence comprehension. Instead, these results are consistent with previous claims that apparent effects of frequency are underlyingly effects of predictability.
Improving classification of speech transcripts
Off-the-shelf speech recognition systems can yield useful results and accelerate application development, but general-purpose systems applied to specialized domains can introduce acoustically small–but semantically catastrophic–errors. Furthermore, sufficient audio data may not be available to develop custom acoustic models for niche tasks. To address these problems, we propose a concept to improve performance in text classification tasks that use speech transcripts as input, without any in-domain audio data. Our method augments available typewritten text training data with inferred phonetic information so that the classifier will learn semantically important acoustic regularities, making it more robust to transcription errors from the general purpose ASR. We successfully pilot our method in a speech-based virtual patient used for medical training, recovering up to 62% of errors incurred by feeding a small test set of speech transcripts to a classification model trained on typescript.
Exploring Mimic Loss for Robust ASR
We have recently devised a non-local criterion, called mimic loss, for training a model for speech denoising. This objective, which uses feedback from a senone classifier trained on clean speech, ensures that the denoising model produces spectral features are useful for speech recognition. We combine this knowledge transfer technique with the traditional local criterion to train the speech enhancer. We achieve a new state-of-the-art for the CHiME-2 corpus by feeding the denoised outputs to an off-the-shelf Kaldi recipe. An in-depth analysis of mimic loss reveals that this performance correlates with better reproduction of consonants with low average energy.
Explicitly Incorporating Tense/Aspect to Facilitate Creation of New Virtual Patients
The Virtual Patient project has collected a fair amount of data from student interactions with a patient presenting with back pain, but there is a desire to include a more diverse array of patients. With adequate training examples, treating the question identification task as a single label classification problem has been fairly successful. However, the current approach is not expected to work well to identify the novel questions that are important for patients with different circumstances, because these new questions have little training support. Exploring the label sets reveals some generalities across patients, including the importance of temporal properties of the symptoms. Including temporal information in the canonical question representations may allow us to leverage external data to mitigate the data sparsity issue for questions unique to new patients. I will solicit feedback on an approach to create a frame-like question representation that incorporates this temporal information, as revealed by the tense and linguistic aspect of clauses in the queries.