Clippers 10/27: Evan Jaffe on coreference

Evan will present his work on coreference, including a practice talk for his recent COLING accepted paper as well as newer neural net additions to the model and some new results.

Abstract:
Models of human sentence processing effort tend to focus on costs associated with retrieving structures and discourse referents from memory (memory-based) and/or on costs associated with anticipating upcoming words and structures based on contextual cues (expectation-based) (Levy, 2008). Although evidence suggests that expectation and memory may play separable roles in language comprehension (Levy et al., 2013), theories of coreference processing have largely focused on memory: how comprehenders identify likely referents of linguistic expressions. In this study, we hypothesize that coreference tracking also informs human expectations about upcoming words, and we test this hypothesis by evaluating the degree to which incremental surprisal measures generated by a novel coreference-aware semantic parser explain human response times in a naturalistic self-paced reading experiment. Results indicate (1) that coreference information indeed guides human expectations and (2) that coreference effects on memory retrieval may exist independently of coreference effects on expectations. Together, these findings suggest that the language processing system exploits coreference information both to retrieve referents from memory and to anticipate upcoming material.

Clippers 10/13: Christian leads discussion of Cynical selection of LM training data

CYNICAL SELECTION OF LANGUAGE MODEL TRAINING DATA

The Moore-Lewis method of “intelligent selection of language model training
data” is very effective, cheap, efficient… and also has structural problems.
(1) The method defines relevance by playing language models trained on the in-domain
and the out-of-domain (or data pool) corpora against each other. This powerful
idea – which we set out to preserve – treats the two corpora as the opposing ends
of a single spectrum. This lack of nuance does not allow for the two corpora to be
very similar. In the extreme case where the come from the same distribution, all of
the sentences have a Moore-Lewis score of zero, so there is no resulting ranking.
(2) The selected sentences are not guaranteed to be able to model the in-domain data,
nor to even cover the in-domain data. They are simply well-liked by the in-domain
model; this is necessary, but not sufficient.
(3) There is no way to tell what is the optimal number of sentences to select, short of
picking various thresholds and building the systems.
We present “cynical selection of training data”: a greedy, lazy, approximate, and generally
efficient method of accomplishing the same goal. It has the following properties:
(1) Is responsive to the extent to which two corpora differ.
(2) Quickly reaches near-optimal vocabulary coverage.
(3) Takes into account what has already been selected.
(4) Does not involve defining any kind of domain, nor any kind of classifier.
(5) Has real units.
(6) Knows approximately when to stop.