Clippers Tuesday: Lifeng Jin on Two Approaches to Virtual Patient Data

At Clippers tomorrow, Lifeng will present on Two Approaches to Virtual Patient Data:

The main focus of the virtual patient project is question matching. I am going to approach this problem from two different angles. The first one is to treat this problem as a sentence similarity problem and use Siamese CNN models and the second is to treat this problem as a classification problem and use feedforward neural nets. I am going to present some preliminary results on virtual patient data and Microsoft paraphrase corpus, and discuss the pros and cons of the two approaches.

Clippers Tuesday: Cory Shain and Marty van Schijndel on Reading Time Modeling

At Clippers on Tuesday, Cory and Marty will be presenting two related talks:

Memory access during incremental sentence processing causes reading time latency
Cory Shain, Marten van Schijndel, Richard Futrell, Edward Gibson and William Schuler

Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli. Our study addresses these concerns by comparing several implementations of prominent sentence processing theories on an exploratory corpus and evaluating the most successful of these on a confirmatory corpus, using a new self-paced reading corpus of seemingly natural narratives constructed to contain an unusually high proportion of memory-intensive constructions. We show highly significant and complementary broad-coverage latency effects both for predictors based on the Dependency Locality Theory and for predictors based on a left-corner parsing model of sentence processing. Our results indicate that memory access during sentence processing does take time, but suggest that stimuli requiring many memory access events may be necessary in order to observe the effect.

Addressing surprisal deficiencies in reading time models
Marten van Schijndel and William Schuler

This study demonstrates a weakness in how n-gram and PCFG surprisal are used to
predict reading times in eye-tracking data. In particular, the information conveyed by
words skipped during saccades is not usually included in the surprisal measures. This
study shows that correcting the surprisal calculation improves n-gram surprisal and that
upcoming n-grams affect reading times, replicating previous findings of how lexical fre-
quencies affect reading times. In contrast, the predictivity of PCFG surprisal does not
benefit from the surprisal correction despite the fact that lexical sequences skipped by
saccades are processed by readers, as demonstrated by the corrected n-gram measure.
These results raise questions about the formulation of information-theoretic measures
of syntactic processing such as PCFG surprisal and entropy reduction when applied to
reading times.