Clippers today: Deblin Bagchi on Generative Adversarial Networks

At Clippers today, Deblin will be presenting his work on Generative Adversarial Networks that he and Adam Stiff have been working on over the summer.

Generative Adversarial Networks have been used extensively in computer vision to generate images from a noise distribution. It has been found that with conditional information, they can learn to map a source distribution to a target distribution. However, their expressive power remains untested in the domain of speech recognition.

Spectral mapping is a feature denoising technique where a model learns to predict clean speech from noisy speech. In this work, we explore the effectiveness of adversarial training on a feedforward network-based (as well as convolutional network-based) spectral mapper to predict clean speech frames from noisy context. However, we have run into some issues which we would like to share and also would like helpful comments and feedback on our future plans.

Clippers Tuesday: Marie-Catherine de Marneffe on Automatically Drawing Inferences

At Clippers Tuesday, Marie-Catherine de Marneffe will be giving a dry run of an upcoming invited talk at the University of Geneva.

Automatically drawing inferences

Marie-Catherine de Marneffe
Linguistics department
The Ohio State University

When faced with a piece of text, humans understand far more than just the literal meaning of the words in the text. In our interactions, much of what we communicate is not said explicitly but rather inferred. However extracting information that is expressed without actually being said remains an issue for NLP. For instance, given(1) and (2), we want to derive that people will generally take that it is war from (1), but will take that relocating species threatened by climate is not a panacea from (2), even though both events are embedded under “(s)he doesn’t believe”.

(1) The problem, I’m afraid, with my colleague here, he really doesn’t believe that it’s war.

(2) Transplanting an ecosystem can be risky, as history shows. Hellmann doesn’t believe that relocating species threatened by climate change is a panacea.

Automatically extracting systematic inferences of that kind is fundamental to a range of NLP tasks, including information extraction, opinion detection, and textual entailment. But surprisingly, at present the vast majority of information extraction systems work at the clause level and regard any event they find as true without taking into account the context in which the event appears in the sentence.

In this talk, I will discuss two case studies of extracting such inferences, to illustrate the general approach I take in my research: use linguistically-motivated features, conjoined with surface-level ones, to enable progress in achieving robust text understanding. First, I will look at how to automatically assess the veridicality of events — whether events described in a text are viewed as actual (as in (1)), non-actual (as in (2)) or uncertain. I will describe a statistical model that balances lexical features like hedges or negations with structural features and approximations of world knowledge, thereby providing a nuanced picture of the diverse factors that shape veridicality. Second, I will examine how to identify (dis)agreement in dialogue, where people rarely overtly (dis)agree with their interlocutor, but their opinion can nonetheless be inferred (in (1) for instance, we infer that the speaker disagrees with his colleague).

Clippers Tuesday: Micha Elsner on Speech Segmentation with Neural Nets; David King on Disambiguating Coordination Ambiguities

At Clippers Tuesday, Micha Elsner and David King will be giving practice talks for EMNLP and for the Explainable Computational Intelligence Workshop at INLG, respectively:

Speech segmentation with a neural net model of working memory
Micha Elsner and Cory Shain

We present the first unsupervised LSTM speech segmenter as a cognitive model of the acquisition of words from unsegmented input. Cognitive biases toward phonological and syntactic predictability in speech are rooted in the limitations of human memory (Baddeley et al., 1998); compressed representations are easier to acquire and retain in memory. To model the biases introduced by these memory limitations, our system uses an LSTM-based encoder-decoder with a small number of hidden units, then searches for a segmentation that minimizes autoencoding loss. Linguistically meaningful segments (e.g. words) should share regular patterns of features that facilitate decoder performance in comparison to random segmentations, and we show that our learner discovers these patterns when trained on either phoneme sequences or raw acoustics. To our knowledge, ours is the first fully unsupervised system to be able to segment both symbolic and acoustic representations of speech.

A Simple Method for Clarifying Sentences with Coordination Ambiguities
Michael White, Manjuan Duan and David King

We present a simple, broad coverage method for clarifying the meaning of sentences with coordination ambiguities, a frequent cause of parse errors. For each of the two most likely parses involving a coordination ambiguity, we produce a disambiguating paraphrase that splits the sentence in two, with one conjunct appearing in each half, so that the span of each conjunct becomes clearer. In a validation study, we show that the method enables meaning judgments to be crowd-sourced with good reliability, achieving 83% accuracy at 80% coverage.

Clippers Tuesday: Manirupa Das on Query Expansion for IR

At Clippers Tuesday, Manirupa will present “A Phrasal Embedding–based General Language Model for Query Expansion in Information Retrieval”:

Traditional knowledge graphs driven by knowledge bases can represent facts about and capture relationships among entities very well, thus performing quite accurately in factual information retrieval. However, in addressing the complex information needs of subjective queries requiring adaptive decision support, these systems can fall short as they are not able to fully capture novel associations among potentially key concepts. In this work, we explore a novel use of language model–based document ranking to develop a fully unsupervised method for query expansion by associating documents with novel related concepts extracted from the text. To achieve this we extend the word embedding-based generalized language model due to Ganguly et al. (2015) to employ phrasal embeddings, and evaluate its performance on an IR task using the TREC 2016 clinical decision support challenge dataset. Our model, used for query expansion both directly and via feedback loop, shows statistically significant improvement not just over various baselines utilizing standard MeSH terms and UMLS concepts for query expansion (Rivas et al., 2014), but also over our word embedding-based language model baseline, built on top of a standard Okapi BM25 based document retrieval system.

NLP/AI, previously: Dan Garrette (Google) on CCG Parsing and Historical Document Transcription

We were pleased to host Dan Garrette from Google the previous Friday, who gave a talk in the NLP/AI series.

Title: Learning from Weak Supervision: Combinatory Categorial Grammars and Historical Document Transcription

Abstract:
As we move NLP toward domains and languages where supervised training resources are not available, there is an increased need to learn models from less annotation. In this talk, I will describe two projects on learning from weak supervision. First, I will discuss work on learning combinatory categorial grammars (CCGs) from incomplete information. In particular, I will show how universal, intrinsic properties of the CCG formalism can be encoded as priors and used to guide the learning of supertaggers and parsers. These universal priors can, in turn, be combined with corpus-specific knowledge derived from limited amounts of available annotation to further improve performance. Second, I will present work on learning to automatically transcribe historical documents that feature heavy use of code-switching and non-standard orthographies that include obsolete spellings, inconsistent diacritic use, typos, and archaic shorthands. Our state-of-the-art model is able to induce language-specific probabilistic mappings from language model data with standard orthography to the document-specific orthography on the page by jointly modeling both variant-preserving and normalized transcriptions. I will conclude with a discussion of how our work has opened up new avenues of research for scholars in the digital humanities, with a focus on transcribing books printed in Mexico in the 1500s

Bio:
Dan is a research scientist at Google in NYC. He was previously a postdoctoral researcher at the University of Washington working with Luke Zettlemoyer, and obtained his PhD at the University of Texas at Austin under the direction of Jason Baldridge and Ray Mooney.

Host: Alan Ritter

Clippers Tuesday: Joo-Kyung Kim on Cross-lingual Transfer Learning for POS Tagging

This Tuesday, Joo-Kyung Kim will be talking about his current work on cross-lingual transfer learning for POS tagging:

POS tagging is a relatively easy task given sufficient training examples, but since each language has its own vocabulary space, parallel corpora are usually required to utilize POS datasets in different languages for transfer learning. In this talk, I introduce a cross-lingual transfer learning model for POS tagging, which utilizes language-general and language-specific representations with auxiliary objectives such as language-adversarial training and language modeling. Evaluating on POS datasets from Universal Dependencies 1.4, I show preliminary results that the proposed model can be effectively used for cross-lingual transfer learning without any parallel corpora or gazetteers.

Clippers Tuesday: Kasia Hitczenko (UMd) on Using Prosody to Learn Sound Categories

This Tuesday, Kasia Hitczenko will be visiting from the University of Maryland:

Using prosody to learn sound categories

Infants must learn the sound categories of their language, but this is difficult because there is variability in speech that causes overlap between categories and masks where the correct categories are. This work investigates whether incorporating knowledge of these systematic sources of variability can improve sound category learning. I present two models that incorporate one such source of variability, namely prosody, into two existing models of sound category learning and present preliminary results on simulated data from one of these models.

Clippers Tuesday: David King on Morphological Reinflection

This Tuesday, David King will be talking about his ongoing work on morphological reinflection:

In a recent shared task, neural machine translation systems performed well at reinflecting a variety of languages (e.g. German, Hungarian, and Turkish), but not Russian. I will present preliminary attempts to analyze where the top performing neural machine translation model still fails with Russian. Since these shortcomings are primarily related to a word’s semantics and sound change (i.e. phonological alternation) I hope to overcome these challenges using Russian word vectors and an additional character level language model.

Clippers Tuesday: Adam Stiff on Speech Recognition from a Dynamical Systems Perspective

This Tuesday, Adam Stiff will be talking about his efforts to take a dynamical systems-based approach to speech recognition (yes, via spiking networks):

Speech can be viewed as a dynamical system (i.e. a continuous function from a state space onto itself, with state changing continuously through time), and in very broad terms, this perspective should be fairly uncontroversial (indeed, it is often the basis for models of speech production). It is, however, extremely impractical, due to the huge number of nonlinear variables involved, and the apparent lack of a framework for learning them. Thus, the tools developed by mathematicians to understand nonlinear dynamical systems have not been widely utilized in attempts at automated speech recognition. I’ll argue that the brain does employ such techniques, and that adapting them could produce benefits in terms of energy efficiency, scalability, and robustness to the problem of catastrophic forgetting in the face of ongoing learning. Furthermore, observation of “fast” (sub-millisecond) dynamics may theoretically offer some benefits for recognition accuracy, and act as a bottom-up factor in learning phone segmentation. I also hope to exhibit some results from an (ongoing) phone classification experiment, to identify constraints that should be respected by a successful implementation of some of these ideas.