At Clippers Tuesday, Manirupa will present “A Phrasal Embedding–based General Language Model for Query Expansion in Information Retrieval”:
Traditional knowledge graphs driven by knowledge bases can represent facts about and capture relationships among entities very well, thus performing quite accurately in factual information retrieval. However, in addressing the complex information needs of subjective queries requiring adaptive decision support, these systems can fall short as they are not able to fully capture novel associations among potentially key concepts. In this work, we explore a novel use of language model–based document ranking to develop a fully unsupervised method for query expansion by associating documents with novel related concepts extracted from the text. To achieve this we extend the word embedding-based generalized language model due to Ganguly et al. (2015) to employ phrasal embeddings, and evaluate its performance on an IR task using the TREC 2016 clinical decision support challenge dataset. Our model, used for query expansion both directly and via feedback loop, shows statistically significant improvement not just over various baselines utilizing standard MeSH terms and UMLS concepts for query expansion (Rivas et al., 2014), but also over our word embedding-based language model baseline, built on top of a standard Okapi BM25 based document retrieval system.
This Tuesday, Joo-Kyung Kim will be talking about his current work on cross-lingual transfer learning for POS tagging:
POS tagging is a relatively easy task given sufficient training examples, but since each language has its own vocabulary space, parallel corpora are usually required to utilize POS datasets in different languages for transfer learning. In this talk, I introduce a cross-lingual transfer learning model for POS tagging, which utilizes language-general and language-specific representations with auxiliary objectives such as language-adversarial training and language modeling. Evaluating on POS datasets from Universal Dependencies 1.4, I show preliminary results that the proposed model can be effectively used for cross-lingual transfer learning without any parallel corpora or gazetteers.
This Tuesday, Kasia Hitczenko will be visiting from the University of Maryland:
Using prosody to learn sound categories
Infants must learn the sound categories of their language, but this is difficult because there is variability in speech that causes overlap between categories and masks where the correct categories are. This work investigates whether incorporating knowledge of these systematic sources of variability can improve sound category learning. I present two models that incorporate one such source of variability, namely prosody, into two existing models of sound category learning and present preliminary results on simulated data from one of these models.
This Tuesday, David King will be talking about his ongoing work on morphological reinflection:
In a recent shared task, neural machine translation systems performed well at reinflecting a variety of languages (e.g. German, Hungarian, and Turkish), but not Russian. I will present preliminary attempts to analyze where the top performing neural machine translation model still fails with Russian. Since these shortcomings are primarily related to a word’s semantics and sound change (i.e. phonological alternation) I hope to overcome these challenges using Russian word vectors and an additional character level language model.
This Tuesday, Adam Stiff will be talking about his efforts to take a dynamical systems-based approach to speech recognition (yes, via spiking networks):
Speech can be viewed as a dynamical system (i.e. a continuous function from a state space onto itself, with state changing continuously through time), and in very broad terms, this perspective should be fairly uncontroversial (indeed, it is often the basis for models of speech production). It is, however, extremely impractical, due to the huge number of nonlinear variables involved, and the apparent lack of a framework for learning them. Thus, the tools developed by mathematicians to understand nonlinear dynamical systems have not been widely utilized in attempts at automated speech recognition. I’ll argue that the brain does employ such techniques, and that adapting them could produce benefits in terms of energy efficiency, scalability, and robustness to the problem of catastrophic forgetting in the face of ongoing learning. Furthermore, observation of “fast” (sub-millisecond) dynamics may theoretically offer some benefits for recognition accuracy, and act as a bottom-up factor in learning phone segmentation. I also hope to exhibit some results from an (ongoing) phone classification experiment, to identify constraints that should be respected by a successful implementation of some of these ideas.
At Clippers Tuesday, I’ll motivate a new approach to scope taking in combinatory categorial grammar and discuss progress and plans for implementing the approach (in collaboration with Jordan Needle, Carl Pollard, Simon Charlow and Dylan Bumford):
A long-standing puzzle in natural language semantics has been how to explain the exceptional scope behavior of indefinites. Charlow (2014) has recently shown that their exceptional scope behavior can be derived from a dynamic semantics treatment of indefinites, i.e. one where the function of indefinites is to introduce discourse referents into the evolving discourse context. To do so, he showed that (1) a monadic approach to dynamic semantics can be seamlessly integrated with Barker and Shan’s (2015) approach to scope taking in continuized grammars, and (2) once one does so, the exceptional scope of indefinites follows from the way the side effect of introducing a discourse referent survives the process of delimiting the scope of true quantifiers such as those expressed with ‘each’ and ‘every’.
To date, computationally implemented approaches to scope taking have not distinguished indefinites from true quantifiers in a way that accounts for their exceptional scope taking. Although Steedman (2011) has developed an account of indefinites’ exceptional scope taking by treating them as underspecified Skolem terms in a non-standard static semantics for Combinatory Categorial Grammar (CCG), this treatment has not been implemented in its full complexity. Moreover, as Barker and Shan point out, Steedman’s theory appears to be undergenerate by not allowing true quantifiers to take scope from medial positions.
Barker and Shan offer a brief sketch of how their approach might be implemented, including how lifting can be invoked lazily to ensure parsing terminates. In this talk, I will show how their approach can be seamlessly combined with Steedman’s CCG and extended to include the first prototype implementation of Charlow’s semantics of indefinites, thereby yielding an approach that improves upon scope taking in CCG while retaining many of its attractive computational properties.
This Tuesday, Micha Elsner will be presenting preliminary work on neural network word segmentation:
Given a corpus of phonemically transcribed utterances with unknown word boundaries, how can a cognitive model extract the vocabulary? I propose a new model based on working memory: the model must balance phonological memory (remembering how to pronounce words) with syntactic memory (remembering the utterance it just heard). Simulating the memory with encoder-decoder RNNs, I use reinforcement learning to optimize the segmentations.
Why build yet another model of word segmentation? (Is this simply a buzzword-compatibility issue? A little bit, but…) I hope to show that this model provides a deeper cognitive account of the prior biases used in previous work, and that its noisy, error-prone reconstruction process makes it inherently robust to variation in its input.
This is work in progress, so don’t expect great things from me yet. However, I will demonstrate model performance slightly worse than Goldwater et al 2009 on a standard dataset and discuss some directions for future work. Criticism, suggestions and thrown paper airplanes welcome.
This Tuesday, Denis Newman-Griffis will be presenting on learning embeddings for ontology concepts:
Recent work on embedding ontology concepts has relied on either expensive manual annotation or automated concept tagging methods that ignore the textual contexts around concepts. We propose a novel method for jointly learning concept, phrase, and word embeddings from an unlabeled text corpus, by using the representative phrases for ontology concepts as distant supervision. We learn embeddings for medical concepts in the Unified Medical Language System and general-domain concepts in YAGO, using a variety of corpora. Our embeddings show performance competitive with existing methods on concept similarity and relatedness tasks, while requiring no human corpus annotation and demonstrating more than 3x coverage in the vocabulary size.
I’ll also be talking a bit about trying to build an analogy completion dataset for the biomedical domain.
At Clippers tomorrow, Lifeng will present on Two Approaches to Virtual Patient Data:
The main focus of the virtual patient project is question matching. I am going to approach this problem from two different angles. The first one is to treat this problem as a sentence similarity problem and use Siamese CNN models and the second is to treat this problem as a classification problem and use feedforward neural nets. I am going to present some preliminary results on virtual patient data and Microsoft paraphrase corpus, and discuss the pros and cons of the two approaches.
At Clippers on Tuesday, Cory and Marty will be presenting two related talks:
Memory access during incremental sentence processing causes reading time latency
Cory Shain, Marten van Schijndel, Richard Futrell, Edward Gibson and William Schuler
Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli. Our study addresses these concerns by comparing several implementations of prominent sentence processing theories on an exploratory corpus and evaluating the most successful of these on a confirmatory corpus, using a new self-paced reading corpus of seemingly natural narratives constructed to contain an unusually high proportion of memory-intensive constructions. We show highly significant and complementary broad-coverage latency effects both for predictors based on the Dependency Locality Theory and for predictors based on a left-corner parsing model of sentence processing. Our results indicate that memory access during sentence processing does take time, but suggest that stimuli requiring many memory access events may be necessary in order to observe the effect.
Addressing surprisal deficiencies in reading time models
Marten van Schijndel and William Schuler
This study demonstrates a weakness in how n-gram and PCFG surprisal are used to
predict reading times in eye-tracking data. In particular, the information conveyed by
words skipped during saccades is not usually included in the surprisal measures. This
study shows that correcting the surprisal calculation improves n-gram surprisal and that
upcoming n-grams affect reading times, replicating previous findings of how lexical fre-
quencies affect reading times. In contrast, the predictivity of PCFG surprisal does not
benefit from the surprisal correction despite the fact that lexical sequences skipped by
saccades are processed by readers, as demonstrated by the corrected n-gram measure.
These results raise questions about the formulation of information-theoretic measures
of syntactic processing such as PCFG surprisal and entropy reduction when applied to