At Clippers on Tuesday, Wuwei Lan will be presenting his EMNLP 2017 paper (with Wei Xu)
Title: Automatic Paraphrase Collection and Identification in Twitter
Paraphrase is a restatement of the meaning of a text or passage using other words, which is helpful in many NLP applications, including machine translation, question answering, semantic parsing and textual similarity. Paraphrase resource is valuable and important, but it is hard to get at large scale, especially for sentence level paraphrases. Here we propose a smart way to automatically collect enormous sentential paraphrases from Twitter, which is simply grouping tweets through shared URLs. We gave the largest human-labeled golden corpus of 51,524 pairs, as well as a silver standard corpus which can grow 30k pairs per month with 70% precision. Based on this paraphrase dataset from Twitter, we experimented with deep learning models for automatic paraphrase identification. We find that without pretrained word embedding, we can still achieve state-of-the-art or more competitive results on social media dataset with only character or subword embedding, which is useful in domain with more out-of-vocabulary words or more spelling variations.
At Clippers Tuesday, Denis Newman-Griffis will be presenting his work looking at the topological structure of word embeddings and how that info can (or can’t) be used downstream.
Word embeddings are now one of the most common tools in the NLP toolbox, and we have a good sense of how to train them, tune them, and apply them effectively. However, the structure of how they encode the information used in downstream applications is much less well-understood. In this talk, I present work analyzing nearest neighborhood topological structures derived from trained word embeddings, discarding absolute feature values and maintaining only the relative organization of points. These structures exhibit several interesting properties, including high variance in the organization of neighborhood graphs derived from embeddings trained on the same corpus with different random initializations. Additionally, I show that graph node embeddings trained over the nearest neighbor graph can be substituted for the original word embeddings in both deep and shallow downstream models for named entity recognition and paraphrase detection, with only a small loss to accuracy and even an increase in recall in some cases. While these graph node embeddings suffer from the same issue of high variance due to random initializations, they exhibit some interesting properties of their own, including generating a higher density point space, remarkably poor performance on analogy tasks, and preservation of similarity at the expense of relatedness.
At Clippers on Tuesday, Adam Stiff will present on domain adaptation for question answering systems.
Abstract: On Tuesday I’ll be presenting a proposal for a strategy to train new virtual patients, which would ideally allow an educator to instantiate a new patient from one set of question-answer pairs. The idea follows some fairly recent work in one-shot learning, and the aim will be to leverage much larger corpora to try to encourage semantically similar questions to be close together in some representation space, to limit the need for extensive training for a new model. The idea is still evolving, so the talk will be very informal, and I hope to get feedback and suggestions about related research that I should be reading, potential pitfalls, extensions, etc.
Title: Deconvolutional time series regression: A technique for modeling temporally diffuse effects
Abstract: This talk proposes Deconvolutional Time Series Regression (DTSR), a general-purpose regression technique for modeling sequential data in which effects can reasonably be assumed to be temporally diffuse. DTSR jointly learns linear effect estimates and temporal convolution parameters from parallel temporal sequences of dependent variable(s) and independent variable(s), using the convolution function to assign time-varying weight to the history of each independent variable in computing the prediction for a given regression target. DTSR successfully recovers true latent convolution functions from synthetic data, and on real-world data from several psycholinguistic experiments DTSR both (1) significantly outperforms competing approaches in terms of prediction error on unseen data and (2) provides plausible, fine-grained, and fairly modality-invariant estimates of the time-course of each regressor’s influence on the dependent measure. These results support the superiority of DTSR to standard modeling approaches like linear mixed-effects regression for a range of experiment types.
Authors: Cory Shain and William Schuler