Predicting Russian Aspect by Frequency Across Genres

Hanne M. Eckhoff
Laura A. Janda
Olga Lyashevskaya

Slavic and East European Journal, vol. 61, no.4 (Winter 2017), pp. 844–875

We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian.

Предсказание глагольного вида

Ханне М. Экхофф
Лора А. Янда
Ольга Ляшевская

В статье рассматривается вопрос, можно ли предсказать вид отдельных глаголов исходя из статистического распределения их грамматических форм и влияют ли жанровые особенности текста на такое распределение. Исследуются так называемые «грамматические профили» (распределения относительной частоты глагольных словоформ) на материале трех выборок из Национального корпуса русского языка для следующих жанров: журналистика, художественная литература, и научно-техническая проза. Мы приходим к выводу, что вид отдельных глаголов может быть предсказан исходя лишь из распределения его форм с точностью 92.7%. Исследование показывает, что нет статистически значимой разницы между предсказанием вида на основе распределения словоформ и предсказанием вида исходя из видовых словообразовательных моделей. Это может свидетельствовать в пользу того, что при усвоении русского глагольного вида дети воспринимают тенденции в дистрибуции словоформ наряду со словообразовательнымы, семантическими и синтаксическими особенностями глагола.

Hanne M. Eckhoff is a Fellow and Tutor in Russian and Linguistics at Oxford University. Eckhoff received her PhD in Russian from University of  Oslo (2007), was a Postdoctoral Fellow at the University of Oslo from 2008 to 2013, and a Senior Researcher and briefly Associate Professor at UiT The Arctic University of Norway from 2013 to 2017, attached to several projects in the field of historical corpus linguistics. Eckhoff plays a leadership role in curating the Old Church Slavonic portion of the PROIEL (Pragmatic Resources in Old Indo-European Languages) corpus.

Laura Janda is Professor of Russian Linguistics at UiT The Arctic University of Norway in Tromsø. Janda received her PhD in Slavic linguistics from UCLA (1984) and has held academic positions at U Rochester and UNC-Chapel Hill, in addition to her current post at UiT. Her research focuses primarily on issues surrounding Russian morphology, such as case and aspect.

Olga Lyashevskaya is Professor of Linguistics at the School of Linguistics of the National Research University Higher School of Economics in Moscow, a Leading Research Fellow at the Linguistic Convergence Laboratory of the same university and a Senior Researcher at Vinogradov Institute of the Russian Language RAS in Moscow. Lyashevskaya received her PhD from the All-Russian Institute of Scientific and Technical Information of the Russian Academy of Sciences (1999). She is especially known for her work on the Russian National Corpus, the Russian FrameBank, and a corpus-based frequency dictionary of Russian.