Given raw (in our case, textual) sentences as input, the Paradigm Discovery Problem (PDP) (Elsner et al., 2019, Erdmann et al., 2020) involves a bi-directional clustering of words into paradigms and cells. For instance, solving the PDP requires one to determine that ring and rang belong to the same paradigm, while bring and bang do not, and that rang and banged belong to the same cell, as they realize the same morphosyntactic property set, i.e., past tense. Solving the PDP is necessary in order to bootstrap to solving what’s often referred to as the Paradigm Cell Filling Problem (PCFP) (Ackerman et al., 2009), i.e., predicting forms that fill yet unrealized cells in partially attested paradigms. That is to say, if I want the plural of thesis, but have only seen the singular, I can only predict theses if I’ve solved the PDP in such a way that allows me to make generalizations regarding how number is realized.
Two forthcoming works address constrained versions of the PDP by focusing on a single part of speech at a time (Erdmann et al., 2020; Kann et al., 2020). For my dissertation, I am trying to adapt the system of Erdmann et al. (2020) to handle the unconstrained PDP by addressing scalability and overfitting issues which lock the system into poor predictions regarding the size of paradigms and prematurely eliminate potentially rewarding regions of the search space. This will be a very informal talk, I’m just looking to get some feedback on some issues I keep running into.
High frequency marker categories in grammar induction
High frequency marker words have been shown crucial in first language acquisition where they provide reliable clues for speech segmentation and grammatical categorization of words. Recent work in model selection of grammar induction has also hinted at a similar role played by high frequency marker words in distributionally inducing grammars. In this work, we first expand the notion of high frequency marker words to high frequency marker categories to include languages where grammatical relations between words are expressed by morphology, not word order. Through analysis of data from previous work and experiments with novel induction models, this work shows that high frequency marker categories are the main drive of accurate grammar induction.
Title: An unsupervised discrete-state sequence model of human language acquisition from speech
Abstract: I will present a progress report on an ongoing attempt to apply discrete-state multi-scale recurrent neural networks as models of child language acquisition from speech. The model is inspired by prior arguments that abstract linguistic representations (e.g. phonemes and words) constrain the acoustic form of natural language utterances, and thus that attempting to efficiently store and anticipate auditory signals may emergently guide child learners to discover underlying linguistic structure. In this study, the artificial learner is a recurrent neural network arranged in interacting layers. Information exchange between adjacent layers is governed by binary detector neurons. When the detector neuron fires between two layers, those layers exchange their current analyses of the input signal in the form of discrete binary codes. Thus, in line with much existing linguistic theory, the model exploits both bottom-up and top-down signals to produce a representation of the input signal that is segmental, discrete, and featural. The learner adapts this behavior in service of four simultaneous unsupervised objectives: reconstructing the past, predicting the future, reconstructing the segment given a label, and reconstructing the label given a segment. Each layer treats the layer below as data, and thus learning is partially driven by attempting to model the learner’s own mental state, in line with influential hypotheses from cognitive neuroscience. The model solves a novel task (unsupervised joint segmentation and labeling of phonemes and words from speech), and it is therefore difficult to establish an overall state of the art performance threshold. However, results for the subtask of unsupervised word segmentation currently lag well behind the state of the art.
BERT is state-of-the-art for event factuality, but still fails on pragmatics
Event factuality prediction is the task of predicting whether an event described in the text is factual or not. It is a complex semantic phenomenon that is important for various NLP downstream tasks e.g. information extraction. For example, in Trump thinks he knows better than the doctors about coronavirus, it is crucial that an information extraction system can identify that Trump knows better than the doctors about coronavirus is nonfactual. Although BERT has boosted the performance of various natural language understanding tasks, its applications to event factuality has been limited to the set-up of natural language inference. In this paper, we investigate how well BERT performs on seven event factuality datasets. We found that although BERT can obtain the new state-of-the-art performance on four existing datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, while fails on instances where pragmatic reasoning overrides. Unlike the high performance suggests, we are still far away from having a robust system for event factuality prediction.
Models of human sentence processing effort tend to focus on costs
associated with retrieving structures and discourse referents from
memory (memory-based) and/or on costs associated with anticipating
upcoming words and structures based on contextual cues
Although evidence suggests that expectation and memory may play
separable roles in language comprehension (Levy et al 2013), theories of
coreference processing have largely focused on memory: how comprehenders
identify likely referents of linguistic expressions.
In this study, we hypothesize that coreference tracking also informs
human expectations about upcoming words, and we test this hypothesis by
evaluating the degree to which incremental surprisal measures generated
by a novel coreference-aware semantic parser explain human response
times in a naturalistic self-paced reading experiment.
Results indicate (1) that coreference information indeed guides human
expectations and (2) that coreference effects on memory retrieval exist
independently of coreference effects on expectations.
Together, these findings suggest that the language processing system
exploits coreference information both to retrieve referents from memory
and to anticipate upcoming material.
Modeling incremental sentence processing with relational graph convolutional networks
We present an incremental model of sentence processing in which syntactic and semantic information influence each other in an interactive manner. To this end, a PCFG-based left-corner parser (van Schijndel et al. 2013) has previously been extended to incorporate the semantic dependency predicate context (i.e. pair; Levy & Goldberg, 2014) associated with each node in the parse tree. In order to further improve the accuracy and generalizability of this model, dense representations of semantic predicate contexts and syntactic categories are learned and utilized as features for making parsing decisions. More specifically, a relational graph convolutional network (RGCN; Schlichtkrull et al. 2018) is trained to learn representations for predicates, as well as role functions for cuing the representation associated with each of its arguments. In addition, syntactic category embeddings are learned jointly with the parsing sub-models to minimize cross-entropy loss. Ultimately, the goal of the model is to provide a measure of predictability that is sensitive to semantic context, which in turn will serve as a baseline for testing claims about the nature of human sentence processing.
Abstract: In this talk, I will present our paper accepted in AAAI 2020. We conduct a data-driven study focusing on analyzing and predicting sentence deletion — a prevalent but understudied phenomenon in document level Text Simplification on a large English text simplification corpus. We inspect various discourse-level factors associated with sentence deletion, using a new manually annotated sentence alignment corpus we collected. We reveal that professional editors utilize different strategies to meet the readability standards of elementary and middle schools. To predict whether a sentence will be deleted during simplification to a certain level, we harness automatically aligned data to train a classification model. We find that discourse-level factors contribute to the challenging task of predicting sentence deletion for simplification.
Bio: Yang Zhong is a first-year Ph.D. student in the Department of Computer Science and Engineering, advised by Prof. Wei Xu. His research mainly focuses on the stylistic variation of language, as well as in the field of document level text simplification.
Linguistic Marker Discovery with BERT
Detecting politeness in text is a task that has attracted attention in recent years due to its role in identifying abusive language. Previous work have either used feature-based models or deep neural networks for this task. Due to the lack of context, feature-based models perform significantly worse compared to modern deep-learning models. We leverage pretrained Bert representations to provide clustering of words based on their context. We show how we are able to obtain interpretable contextualized features that can help reduce the gap in performance between feature-based models and deep learning approaches.
What can computational methods do for sociolinguistics?
This talk provides a brief overview of computational sociolinguistics, an emerging field with the twin goals of improving NLP systems using sociolinguistics and of answering sociolinguistic questions using NLP and other computational methods. I briefly discuss what sociolinguistics can do for NLP, then turn to what NLP/computational methods can do for sociolinguistics, using two examples from my research: (1) using SVMs for word sense disambiguation on Twitter data to compare regional variation in African American versus white US English, and (2) using hierarchical cluster analysis to study individual differences in patterns of social meaning. Finally, I discuss future directions for computational sociolinguistics.
Title: A typology of ambiguity in medical concept normalization datasets
Medical concept normalization (MCN; also called biomedical word sense disambiguation) is the task of assigning unique concept identifiers (CUIs) to mentions of biomedical concepts. Several MCN datasets focusing on Electronic Health Record (EHR) data have been developed over the past decade, and while several challenges due to conceptual ambiguity have been identified in methodological research, the types of lexical ambiguity exhibited by clinical MCN datasets has not been systematically studied. I will present preliminary results of an ongoing analysis of benchmark clinical MCN datasets, describing an initial, domain-specific typology of lexical ambiguity in MCN annotations. I will also discuss desiderata for future MCN research aimed at addressing these challenges in both methods and evaluation.