At Clippers Tuesday Lifeng Jin will present:
Unsupervised Grammar Induction with Depth-bounded PCFG
There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models. In this talk, I will present a Bayesian grammar induction model which extends this depth-bounding approach to probabilistic context-free grammar induction (DB-PCFG), which has a smaller parameter space than hierarchic sequence models, and therefore more fully exploits the space reductions of depth-bounding.
Results for this model on grammar acquisition from a synthetic dataset and transcribed child-directed speech exceed those of other models when evaluated on parse accuracy. Moreover, grammars acquired from this model demonstrate a consistent use of category labels, something which has not been demonstrated by other acquisition models.
At Clippers Tuesday, Zhen Wang will present joint work with Huan Sun on separating code from natural language text.
Title: Separating Text and Code for Next Utterance Classification in Stack Overflow
Abstract: In this talk, we will discuss our ongoing work on (1) developing tools to separate natural language text and programming code in a Stack Overflow (SO) comment, and (2) applying them to the Next Utterance Classification (NUC) task. In SO, a comment is posted after a question or answer post, and usually contains much information about follow-up questions, suggestions, opinions, etc. It is often a mixture of two different modalities: natural language and programming language, which distinguishes itself from other comments on social media like Twitter and Facebook. Such bi-modal mixture property makes it more difficult for machine to understand. We hypothesize that separating code and natural text should be the first step for tasks involving understanding programming-related text. While careful comment writers may use special formatting to distinguish natural words and programming tokens, noisy SO comments like “You will first need to: import collections # to use defaultdict” that simply mix text and code together are also very common. Therefore, in our first task, we study automatically separating code and text in noisy SO comments, which is casted as a sequence labeling problem. In our preliminary experiments, we tested a series of baseline models including traditional CRF with hand-crafted features and the state-of-the-art neural methods for NER task. Our results show that for tokens that can appear in both programming and natural language context, such as “exception”, “timeout”, and “flatten”, the baseline models cannot make accurate predictions of their labels. We are trying to improve the baseline models using domain-specific knowledge as well as more advancedneural architectures.
In our second task, we investigate whether separately modeling text and code can help the Next Utterance Classification (NUC) task on SO comments, which is to classify whether an utterance is a response to another. For training/validating/testing models, we design special rules to collect context-response pairs on Stackoverflow comments containing both natural language and code snippets.Siamese networks with tied Bi-LSTM were implemented for NUC task, with and without code snippets treated differently from natural text. Beyond the current work, our research plan is to mine the rich resources in Stack Overflow, understand text-code mixed data, and develop programming related intelligent assistants in the long run.
Any suggestions and comments are highly appreciated.
At Clippers on Tuesday, Jie Zhao will present work with Huan Sun on product-related question answering. Title and abstract below.
Title: Answer Retrieval on E-commerce Websites via Weakly Supervised Question Reformulation
Abstract: In this seminar, I will talk about our ongoing work about product-related question answering on E-commerce websites, which aims to retrieve answers from a large corpus of answer candidates. Our problem setting is different from traditional answer selection where a small answer candidate set is pre-defined and the state-of-the-art models generally adopt sophisticated models to match the semantics between the QA pairs. However, these methods will be very expensive to use when the answer candidate set is large and dynamically increasing. In our work, we adopt a classic light-weight TF-IDF search scheme for efficiency reasons but aim at better retrieval results through question reformulation. One of the challenges here is the lack of direct labeled data with pairs. To address this, we look into the word-matching results of the existing QA pairs as weak supervision signals, and define different sub-tasks that 1) learn focus attention on the question words, 2) infer words that will possibly occur in a true answer and 3) use the result of the first two sub-tasks as reformulated question to improve the final retrieval performance. We model the inter-relations among these sub-tasks and train it under a multi-task learning scheme. Preliminary results show our model has the potential to achieve better retrieval performance than existing baseline methods while guaranteeing lower search complexities. Currently, our model still does not perform very well on the second sub-task, possibly because of the large vocabulary space. We are exploring various learning strategies to further improve it. Any suggestions and comments will be appreciated.
At Clippers Tuesday, Evan Jaffe will presenting work in progress using Sequential Matching Networks to do dialogue response selection.
SMN architecture is designed to maintain dialogue history (using an RNN) and thus provide extended context. The task is formulated as ranking a set of k candidate responses, given a dialogue history. Preliminary results on a virtual patient dataset show good ranking accuracy (95% on dev) when the network chooses between the true next response, and 9 randomly selected negative examples. However, this task may be too easy, so a few more challenging tests are worth exploring, including increasing the size of k and choosing more confusable candidates. An n-gram overlap could be a good baseline. Ultimately, using the SMN to rerank an n-best list coming from a CNN model (Jin et al 2017) could prove beneficial, complementing the CNN with an ability to track previous turns. This history could be useful for questions with zero anaphora like, ‘What dose’, which crucially rely on previous turns for successful interpretation.