Byung-Doh Oh will be presenting his work unsupervised grammar induction, followed by some attempts to extend the project.
Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages
Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.
Nanjiang Jiang will be workshopping her project on natural language inference annotations.
Willy Cheung will lead a discussion of Yu and Ettinger’s (EMNLP-20) paper to help prepare for Allyson Ettinger’s upcoming invited talk on September 24:
Assessing Phrasal Representation and Composition in Transformers
Lang Yu, Allyson Ettinger
Deep transformer models have pushed performance on NLP tasks to new limits, suggesting sophisticated treatment of complex linguistic inputs, such as phrases. However, we have limited understanding of how these models handle representation of phrases, and whether this reflects sophisticated composition of phrase meaning like that done by humans. In this paper, we present systematic analysis of phrasal representations in state-of-the-art pre-trained transformers. We use tests leveraging human judgments of phrase similarity and meaning shift, and compare results before and after control of word overlap, to tease apart lexical effects versus composition effects. We find that phrase representation in these models relies heavily on word content, with little evidence of nuanced composition. We also identify variations in phrase representation quality across models, layers, and representation types, and make corresponding recommendations for usage of representations from these models.
Ash Lewis and Lingbo Mo will present their work with Huan Sun and Mike White titled “Transparent Dialogue for Step-by-Step Semantic Parse Correction”. Here’s the abstract:
Existing studies on semantic parsing focus primarily on mapping a natural-language utterance to a corresponding logical form in a one-shot setting. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework, which shows the user how a complex question is answered step-by-step and enables them to make corrections through natural-language feedback to each step in order to increase the clarity and accuracy of parses. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, and construct INSPIRED, a transparent dialogue dataset with complex questions, predicted logical forms, and step-by-step, natural-language feedback. Our experiments show that the interactive framework with human feedback can significantly improve the overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to apply the framework to other various state-of-the-art models for KBQA and largely improve their performance as well, which sheds light on the generalizability of this framework for other parsers without further annotation effort.