At Clippers Tuesday, Manirupa will present “A Phrasal Embedding–based General Language Model for Query Expansion in Information Retrieval”:
Traditional knowledge graphs driven by knowledge bases can represent facts about and capture relationships among entities very well, thus performing quite accurately in factual information retrieval. However, in addressing the complex information needs of subjective queries requiring adaptive decision support, these systems can fall short as they are not able to fully capture novel associations among potentially key concepts. In this work, we explore a novel use of language model–based document ranking to develop a fully unsupervised method for query expansion by associating documents with novel related concepts extracted from the text. To achieve this we extend the word embedding-based generalized language model due to Ganguly et al. (2015) to employ phrasal embeddings, and evaluate its performance on an IR task using the TREC 2016 clinical decision support challenge dataset. Our model, used for query expansion both directly and via feedback loop, shows statistically significant improvement not just over various baselines utilizing standard MeSH terms and UMLS concepts for query expansion (Rivas et al., 2014), but also over our word embedding-based language model baseline, built on top of a standard Okapi BM25 based document retrieval system.
We were pleased to host Dan Garrette from Google the previous Friday, who gave a talk in the NLP/AI series.
Title: Learning from Weak Supervision: Combinatory Categorial Grammars and Historical Document Transcription
As we move NLP toward domains and languages where supervised training resources are not available, there is an increased need to learn models from less annotation. In this talk, I will describe two projects on learning from weak supervision. First, I will discuss work on learning combinatory categorial grammars (CCGs) from incomplete information. In particular, I will show how universal, intrinsic properties of the CCG formalism can be encoded as priors and used to guide the learning of supertaggers and parsers. These universal priors can, in turn, be combined with corpus-specific knowledge derived from limited amounts of available annotation to further improve performance. Second, I will present work on learning to automatically transcribe historical documents that feature heavy use of code-switching and non-standard orthographies that include obsolete spellings, inconsistent diacritic use, typos, and archaic shorthands. Our state-of-the-art model is able to induce language-specific probabilistic mappings from language model data with standard orthography to the document-specific orthography on the page by jointly modeling both variant-preserving and normalized transcriptions. I will conclude with a discussion of how our work has opened up new avenues of research for scholars in the digital humanities, with a focus on transcribing books printed in Mexico in the 1500s
Dan is a research scientist at Google in NYC. He was previously a postdoctoral researcher at the University of Washington working with Luke Zettlemoyer, and obtained his PhD at the University of Texas at Austin under the direction of Jason Baldridge and Ray Mooney.
This Tuesday, Joo-Kyung Kim will be talking about his current work on cross-lingual transfer learning for POS tagging:
POS tagging is a relatively easy task given sufficient training examples, but since each language has its own vocabulary space, parallel corpora are usually required to utilize POS datasets in different languages for transfer learning. In this talk, I introduce a cross-lingual transfer learning model for POS tagging, which utilizes language-general and language-specific representations with auxiliary objectives such as language-adversarial training and language modeling. Evaluating on POS datasets from Universal Dependencies 1.4, I show preliminary results that the proposed model can be effectively used for cross-lingual transfer learning without any parallel corpora or gazetteers.
This Tuesday, Kasia Hitczenko will be visiting from the University of Maryland:
Using prosody to learn sound categories
Infants must learn the sound categories of their language, but this is difficult because there is variability in speech that causes overlap between categories and masks where the correct categories are. This work investigates whether incorporating knowledge of these systematic sources of variability can improve sound category learning. I present two models that incorporate one such source of variability, namely prosody, into two existing models of sound category learning and present preliminary results on simulated data from one of these models.
This Tuesday, David King will be talking about his ongoing work on morphological reinflection:
In a recent shared task, neural machine translation systems performed well at reinflecting a variety of languages (e.g. German, Hungarian, and Turkish), but not Russian. I will present preliminary attempts to analyze where the top performing neural machine translation model still fails with Russian. Since these shortcomings are primarily related to a word’s semantics and sound change (i.e. phonological alternation) I hope to overcome these challenges using Russian word vectors and an additional character level language model.
This Tuesday, Adam Stiff will be talking about his efforts to take a dynamical systems-based approach to speech recognition (yes, via spiking networks):
Speech can be viewed as a dynamical system (i.e. a continuous function from a state space onto itself, with state changing continuously through time), and in very broad terms, this perspective should be fairly uncontroversial (indeed, it is often the basis for models of speech production). It is, however, extremely impractical, due to the huge number of nonlinear variables involved, and the apparent lack of a framework for learning them. Thus, the tools developed by mathematicians to understand nonlinear dynamical systems have not been widely utilized in attempts at automated speech recognition. I’ll argue that the brain does employ such techniques, and that adapting them could produce benefits in terms of energy efficiency, scalability, and robustness to the problem of catastrophic forgetting in the face of ongoing learning. Furthermore, observation of “fast” (sub-millisecond) dynamics may theoretically offer some benefits for recognition accuracy, and act as a bottom-up factor in learning phone segmentation. I also hope to exhibit some results from an (ongoing) phone classification experiment, to identify constraints that should be respected by a successful implementation of some of these ideas.
At Clippers Tuesday, I’ll motivate a new approach to scope taking in combinatory categorial grammar and discuss progress and plans for implementing the approach (in collaboration with Jordan Needle, Carl Pollard, Simon Charlow and Dylan Bumford):
A long-standing puzzle in natural language semantics has been how to explain the exceptional scope behavior of indefinites. Charlow (2014) has recently shown that their exceptional scope behavior can be derived from a dynamic semantics treatment of indefinites, i.e. one where the function of indefinites is to introduce discourse referents into the evolving discourse context. To do so, he showed that (1) a monadic approach to dynamic semantics can be seamlessly integrated with Barker and Shan’s (2015) approach to scope taking in continuized grammars, and (2) once one does so, the exceptional scope of indefinites follows from the way the side effect of introducing a discourse referent survives the process of delimiting the scope of true quantifiers such as those expressed with ‘each’ and ‘every’.
To date, computationally implemented approaches to scope taking have not distinguished indefinites from true quantifiers in a way that accounts for their exceptional scope taking. Although Steedman (2011) has developed an account of indefinites’ exceptional scope taking by treating them as underspecified Skolem terms in a non-standard static semantics for Combinatory Categorial Grammar (CCG), this treatment has not been implemented in its full complexity. Moreover, as Barker and Shan point out, Steedman’s theory appears to be undergenerate by not allowing true quantifiers to take scope from medial positions.
Barker and Shan offer a brief sketch of how their approach might be implemented, including how lifting can be invoked lazily to ensure parsing terminates. In this talk, I will show how their approach can be seamlessly combined with Steedman’s CCG and extended to include the first prototype implementation of Charlow’s semantics of indefinites, thereby yielding an approach that improves upon scope taking in CCG while retaining many of its attractive computational properties.
This Tuesday, Micha Elsner will be presenting preliminary work on neural network word segmentation:
Given a corpus of phonemically transcribed utterances with unknown word boundaries, how can a cognitive model extract the vocabulary? I propose a new model based on working memory: the model must balance phonological memory (remembering how to pronounce words) with syntactic memory (remembering the utterance it just heard). Simulating the memory with encoder-decoder RNNs, I use reinforcement learning to optimize the segmentations.
Why build yet another model of word segmentation? (Is this simply a buzzword-compatibility issue? A little bit, but…) I hope to show that this model provides a deeper cognitive account of the prior biases used in previous work, and that its noisy, error-prone reconstruction process makes it inherently robust to variation in its input.
This is work in progress, so don’t expect great things from me yet. However, I will demonstrate model performance slightly worse than Goldwater et al 2009 on a standard dataset and discuss some directions for future work. Criticism, suggestions and thrown paper airplanes welcome.
This Tuesday, Denis Newman-Griffis will be presenting on learning embeddings for ontology concepts:
Recent work on embedding ontology concepts has relied on either expensive manual annotation or automated concept tagging methods that ignore the textual contexts around concepts. We propose a novel method for jointly learning concept, phrase, and word embeddings from an unlabeled text corpus, by using the representative phrases for ontology concepts as distant supervision. We learn embeddings for medical concepts in the Unified Medical Language System and general-domain concepts in YAGO, using a variety of corpora. Our embeddings show performance competitive with existing methods on concept similarity and relatedness tasks, while requiring no human corpus annotation and demonstrating more than 3x coverage in the vocabulary size.
I’ll also be talking a bit about trying to build an analogy completion dataset for the biomedical domain.
This past Tuesday, 2/7, Evan Jaffe presented on his progress on the Virtual Patient project:
I’ll be discussing results on a baseline log-linear model and the improvement gained from using a simple embedding similarity feature. I’ll also discuss motivation/related work and current status of implementing a simple CNN with padding and max pooling to do multiclass classification for this dataset.