At Clippers Tuesday, I’ll motivate a new approach to scope taking in combinatory categorial grammar and discuss progress and plans for implementing the approach (in collaboration with Jordan Needle, Carl Pollard, Simon Charlow and Dylan Bumford):
A long-standing puzzle in natural language semantics has been how to explain the exceptional scope behavior of indefinites. Charlow (2014) has recently shown that their exceptional scope behavior can be derived from a dynamic semantics treatment of indefinites, i.e. one where the function of indefinites is to introduce discourse referents into the evolving discourse context. To do so, he showed that (1) a monadic approach to dynamic semantics can be seamlessly integrated with Barker and Shan’s (2015) approach to scope taking in continuized grammars, and (2) once one does so, the exceptional scope of indefinites follows from the way the side effect of introducing a discourse referent survives the process of delimiting the scope of true quantifiers such as those expressed with ‘each’ and ‘every’.
To date, computationally implemented approaches to scope taking have not distinguished indefinites from true quantifiers in a way that accounts for their exceptional scope taking. Although Steedman (2011) has developed an account of indefinites’ exceptional scope taking by treating them as underspecified Skolem terms in a non-standard static semantics for Combinatory Categorial Grammar (CCG), this treatment has not been implemented in its full complexity. Moreover, as Barker and Shan point out, Steedman’s theory appears to be undergenerate by not allowing true quantifiers to take scope from medial positions.
Barker and Shan offer a brief sketch of how their approach might be implemented, including how lifting can be invoked lazily to ensure parsing terminates. In this talk, I will show how their approach can be seamlessly combined with Steedman’s CCG and extended to include the first prototype implementation of Charlow’s semantics of indefinites, thereby yielding an approach that improves upon scope taking in CCG while retaining many of its attractive computational properties.
This Tuesday, Micha Elsner will be presenting preliminary work on neural network word segmentation:
Given a corpus of phonemically transcribed utterances with unknown word boundaries, how can a cognitive model extract the vocabulary? I propose a new model based on working memory: the model must balance phonological memory (remembering how to pronounce words) with syntactic memory (remembering the utterance it just heard). Simulating the memory with encoder-decoder RNNs, I use reinforcement learning to optimize the segmentations.
Why build yet another model of word segmentation? (Is this simply a buzzword-compatibility issue? A little bit, but…) I hope to show that this model provides a deeper cognitive account of the prior biases used in previous work, and that its noisy, error-prone reconstruction process makes it inherently robust to variation in its input.
This is work in progress, so don’t expect great things from me yet. However, I will demonstrate model performance slightly worse than Goldwater et al 2009 on a standard dataset and discuss some directions for future work. Criticism, suggestions and thrown paper airplanes welcome.
This Tuesday, Denis Newman-Griffis will be presenting on learning embeddings for ontology concepts:
Recent work on embedding ontology concepts has relied on either expensive manual annotation or automated concept tagging methods that ignore the textual contexts around concepts. We propose a novel method for jointly learning concept, phrase, and word embeddings from an unlabeled text corpus, by using the representative phrases for ontology concepts as distant supervision. We learn embeddings for medical concepts in the Unified Medical Language System and general-domain concepts in YAGO, using a variety of corpora. Our embeddings show performance competitive with existing methods on concept similarity and relatedness tasks, while requiring no human corpus annotation and demonstrating more than 3x coverage in the vocabulary size.
I’ll also be talking a bit about trying to build an analogy completion dataset for the biomedical domain.
This past Tuesday, 2/7, Evan Jaffe presented on his progress on the Virtual Patient project:
I’ll be discussing results on a baseline log-linear model and the improvement gained from using a simple embedding similarity feature. I’ll also discuss motivation/related work and current status of implementing a simple CNN with padding and max pooling to do multiclass classification for this dataset.