Are syntactic categories real?
People can express novel, precise complex ideas — plans with sophisticated contingencies, predictive models of interrelated uncertain events, and more — which seems to suggest a formal, compositional semantics in which sentences are divided into categories with associated semantic functions. But state-of-the-art NLP systems – transformers like BERT and GPT-3 — don’t work like that. This talk will review evidence about syntactic categories from sentence processing experiments and grammar inductions simulations conducted over the past few years in the OSU computational cognitive modeling lab, and hazard some guesses about the cognitive status of syntactic categories.
Title: Semi-Supervised Heterogeneous Feature Learning in a Large-Scale Conversational AI System
Abstract: This paper aims to improve an important downstream component of a large-scale industrial conversational AI system. The component is called the Skill Routing Component (SRC) and is responsible for a variety of tasks. As the last component before executing user requests, SRC utilizes many textual and symbolic features obtained from heterogeneous upstream components like automatic speech recognition (ASR) and natural language understanding (NLU), which necessitates the need for an efficient way to utilize these features. To achieve this, we propose a unified transformer model which in contrast to the traditional methods encodes the heterogeneous features into a shared latent space. Next, there is an inherent connection between SRC tasks and upstream NLU tasks. We utilize noisy NLU data for pre-training the unified SRC model via specifically curated objectives and fine-tune it separately on the different SRC tasks. Our method shows an average improvement of 1.8% on four SRC tasks over the state-of-the-art baseline.
Title: Towards end-to-end integration of dialog history for improved spoken language understanding.
Abstract: Dialog history plays an important role in spoken language understanding (SLU) performance in a dialog system. For end-to-end (E2E) SLU, previous work has used dialog history in text form, which makes the model dependent on a cascaded automatic speech recognizer (ASR). This rescinds the benefits of an E2E system which is intended to be compact and robust to ASR errors. In this work, we propose a hierarchical conversation model that is capable of directly using dialog history in speech form, making it fully E2E. We also distill semantic knowledge from the available gold conversation transcripts by jointly training a similar text-based conversation model with an explicit tying of acoustic and semantic embeddings. We also propose a novel technique that we call DropFrame to deal with the long training time incurred by adding dialog history in an E2E manner. On the HarperValleyBank dialog dataset, our E2E history integration outperforms a history independent baseline by 7.7% absolute F1 score on the task of dialog action recognition. Our model performs competitively with the state-of-the-art history based cascaded baseline, but uses 48% fewer parameters. In the absence of gold transcripts to fine-tune an ASR model, our model outperforms this baseline by a significant margin of 10% absolute F1 score.
Byung-Doh Oh will be presenting his work unsupervised grammar induction, followed by some attempts to extend the project.
Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages
Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.
Nanjiang Jiang will be workshopping her project on natural language inference annotations.
Willy Cheung will lead a discussion of Yu and Ettinger’s (EMNLP-20) paper to help prepare for Allyson Ettinger’s upcoming invited talk on September 24:
Assessing Phrasal Representation and Composition in Transformers
Lang Yu, Allyson Ettinger
Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.
Ash Lewis and Lingbo Mo will present their work with Huan Sun and Mike White titled “Transparent Dialogue for Step-by-Step Semantic Parse Correction”. Here’s the abstract:
Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in a one-shot setting. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework, which shows the user how a complex question is answered step-by-step and enables them to make corrections through natural-language feedback to each step in order to increase the clarity and accuracy of parses. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, and construct INSPIRED, a transparent dialogue dataset with complex questions, predicted logical forms, and step-by-step, natural-language feedback. Our experiments show that the interactive framework with human feedback can significantly improve the overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to apply the framework to other various state-of-the-art models for KBQA and largely improve their performance as well, which sheds light on the generalizability of this framework for other parsers without further annotation effort.
Xintong Li will present his work with Symon Jory Stevens-Guille, Aleksandre Maskharashvili and me on self-training for compositional neural NLG, including material from our upcoming INLG-21 paper along with some additional background.
Here’s the abstract for our INLG paper:
Neural approaches to natural language generation in task-oriented dialogue have typically required large amounts of annotated training data to achieve satisfactory performance, especially when generating from compositional inputs. To address this issue, we show that self-training enhanced with constrained decoding yields large gains in data efficiency on a conversational weather dataset that employs compositional meaning representations. In particular, our experiments indicate that self-training with constrained decoding can enable sequence-to-sequence models to achieve satisfactory quality using vanilla decoding with five to ten times less data than with ordinary supervised baseline; moreover, by leveraging pretrained models, data efficiency can be increased further to fifty times. We confirm the main automatic results with human evaluations and show that they extend to an enhanced, compositional version of the E2E dataset. The end result is an approach that makes it possible to achieve acceptable performance on compositional NLG tasks using hundreds rather than tens of thousands of training samples.
Given raw (in our case, textual) sentences as input, the Paradigm Discovery Problem (PDP) (Elsner et al., 2019, Erdmann et al., 2020) involves a bi-directional clustering of words into paradigms and cells. For instance, solving the PDP requires one to determine that ring and rang belong to the same paradigm, while bring and bang do not, and that rang and banged belong to the same cell, as they realize the same morphosyntactic property set, i.e., past tense. Solving the PDP is necessary in order to bootstrap to solving what’s often referred to as the Paradigm Cell Filling Problem (PCFP) (Ackerman et al., 2009), i.e., predicting forms that fill yet unrealized cells in partially attested paradigms. That is to say, if I want the plural of thesis, but have only seen the singular, I can only predict theses if I’ve solved the PDP in such a way that allows me to make generalizations regarding how number is realized.
Two forthcoming works address constrained versions of the PDP by focusing on a single part of speech at a time (Erdmann et al., 2020; Kann et al., 2020). For my dissertation, I am trying to adapt the system of Erdmann et al. (2020) to handle the unconstrained PDP by addressing scalability and overfitting issues which lock the system into poor predictions regarding the size of paradigms and prematurely eliminate potentially rewarding regions of the search space. This will be a very informal talk, I’m just looking to get some feedback on some issues I keep running into.
High frequency marker categories in grammar induction
High frequency marker words have been shown crucial in first language acquisition where they provide reliable clues for speech segmentation and grammatical categorization of words. Recent work in model selection of grammar induction has also hinted at a similar role played by high frequency marker words in distributionally inducing grammars. In this work, we first expand the notion of high frequency marker words to high frequency marker categories to include languages where grammatical relations between words are expressed by morphology, not word order. Through analysis of data from previous work and experiments with novel induction models, this work shows that high frequency marker categories are the main drive of accurate grammar induction.