I will presenting the work I’ve been doing on my QP2 in which I am attempting to use RSA (Rational Speech Act) approaches to improve a question generation model. This work is an extension of previous work on Transparent Interactive Semantic Parsing, in which we develop a dialogue agent that helps a human user query a knowledge base in natural language. The dialogue agent parses an NL utterance into a SPARQL query, decomposes it into pieces, retrieves answers, then translates the entire process into a series of natural language sub-questions so that the user can validate the results or make corrections as necessary. The current work focuses on the sub-question generation sub-task, in which it is very important for the question to accurately and coherently represent the meaning of its SPARQL query. To this end, I experiment with RSA-style approaches of explicit modeling of a listener to improve the generator. Specifically in this work I focus on a “reconstructor”-based method in which a listener model is trained to recover the original meaning representation (SPARQL query) from a base speaker model. I will show my experiments with self-training using the reconstructor-based model and detail my in-progress work with a “distractor”-based approach, in which the model attempts to generate an utterance that distinguishes an input from possible distractor inputs.
Nanjiang will workshop plans for her dissertation, which will be on explaining NLI disagreement.
In the last few years, deep learning approaches using the pretraining/finetuning approach have become state-of-the-art on a number of language tasks. Due to the success of pretrained neural language models, the following question has been raised: to what extent can good general linguistic representations be learned from language modeling alone? One line of research that aims to test this treats pretrained neural language models as linguistic experiment subjects, using the probabilities output by neural models as a proxy for acceptability on linguistic data in minimal pairs. With this approach, I will present tests on data from one particular cataphora study on GPT2, and will also discuss ongoing work in this vein.
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to effective pretraining of speech representations. One such pretraining paradigm is the distillation of semantic knowledge from state-of-the-art text-based models like BERT to speech encoder neural networks. This work is a step towards doing the same in a much more efficient and fine-grained manner where we align speech embeddings and BERT embeddings on a token-by-token basis. We introduce a simple yet novel technique that uses a cross-modal attention mechanism to extract token-level contextual embeddings from a speech encoder such that these can be directly compared and aligned with BERT based contextual embeddings. This alignment is performed using a novel tokenwise contrastive loss. Fine-tuning such a pretrained model to perform intent recognition using speech directly yields state-of-the-art performance on two widely used SLU datasets. Our model improves further when fine-tuned with additional regularization using SpecAugment especially when speech is noisy, giving an absolute improvement as high as 8% over previous results.
Complex question answering (CQA) requires multi-hop reasoning to combine multiple pieces of evidences ideally from different knowledge sources. Considering the insufficient labeled data in a single knowledge source and expensive human annotations, we study knowledge transfer for CQA between heterogeneous sources including a text corpus and a knowledge base (KB). To facilitate knowledge transfer between sources, we first propose a unified framework, SimultQA, to bridge KBQA and TextQA systems, which could leverage supervisions from both sources. By conducting experiments on CWQ and HotpotQA that are two popular datasets originally designed for KBQA and TextQA respectively, we explore how knowledge is transferred between sources following the pre-training and fine-tuning paradigm, and find that knowledge transfer between heterogeneous sources consistently improves the QA performance. We also conduct fine-grained analysis and hybrid evaluation experiments to further explain what knowledge has been transferred.
On Tuesday, Ron Chen and I will discuss our ongoing work on the AlexaPrize Taskbot Challenge. The competition, which is currently in the semi-finals stage, involves 9 teams that are developing taskbots that will assist an Alexa user to go through a recipe or DIY task in a step-by-step, engaging manner. We will do a demonstration of our taskbot and outline our efforts on topics including dialogue management, response generation, question answering, and user engagement. We hope to solicit feedback both on technical aspects of the work as well as ways in which the bot can be made more engaging and intuitive for users.
Analyzing the predictive power of neural LM surprisal
This work presents an in-depth analysis of an observation that contradicts the findings of recent work in computational psycholinguistics, namely that smaller GPT-2 models that show higher test perplexity nonetheless generate surprisal estimates that are more predictive of human reading times. Analysis of the surprisal values shows that rare proper nouns, which are typically tokenized into multiple subword tokens, are systematically assigned lower surprisal values by the larger GPT-2 models. A comparison of residual errors from regression models fit to reading times reveals that regression models with surprisal predictors from smaller GPT-2 models have significantly lower mean absolute errors on words that are tokenized into multiple tokens, while this trend is not observed on words that are kept intact. These results indicate that the ability of larger GPT-2 models to predict internal pieces of rare words more accurately makes their surprisal estimates deviate from humanlike expectations that manifest in self-paced reading times and eye-gaze durations.
Recent work in natural language generation has seen an increase in end-to- end neural network model usage. We report on ongoing work exploring how well these models can generate discourse that is coherent while still preserving the content of the input. We exemplify this work with results on the generation of discourses by the widely used model BART, which we fine-tune on texts reconstructed from the Penn Discourse Tree Bank. These texts are structured by explicit and implicit discourse connectives, e.g. ‘but’, ‘while’, ‘however’. We show that encoding in the input the discourse relation to be expressed by the connective, e.g. ‘Contingency Cause Result’, improves how well the model expresses the intended discourse relation, including whether the connective is implicit or explicit. Metrics inspired by psycholinguistic results are discussed.
Human sentence processing appears to require assembling the meanings of words into precise interpretations, a process that can be described in terms of semantic composition operations such as extraction and argument attachment. Using a set of broad-coverage psycholinguistic corpora with annotations from a generalized categorial grammar (Nguyen et al., 2012), we test the extent to which such composition operations influence self-paced reading times, eye-tracking measures, and fMRI BOLD signal. We find evidence for effects from several operations such as argument attachment and extraction; the latter effect is confirmed in a separate test on held-out data. Our results suggest that composition operations may play an explicit role in the construction of meaning over the course of sentence processing.
Computational Models of Sentence Processing and Syntactic Acquisition
This talk will provide a survey of our recent work on models of sentence processing and syntactic acquisition. First, this talk introduces an incremental left-corner parser that incorporates information about common linguistic abstractions such as syntactic categories, predicate-argument structure, and morphological rules as a computational-level model of sentence processing. Experimental results show that surprisal estimates from the proposed processing model deliver comparable and often superior fits to self-paced reading and eye-tracking data compared to those from pre-trained neural language models, suggesting that the strong linguistic generalizations made by the proposed model may help predict humanlike processing costs that manifest in latency-based measures. Subsequently, this talk presents a neural PCFG induction model that allows a clean manipulation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations.