Corpora for Middle and New Indo-Aryan

My research on changes in the Indo-Aryan language family relys on its rich textual record. Very little of this record for the Middle Indo-Aryan and the New Indo-Aryan stages is available in any electronic form. This makes any kind of corpus research for these stages of the language very difficult. Part of the efforts of NSF-CAREER-1255547 are directed towards making such corpora available for research to the Indo-Aryan scholarly community.

The DCOMA (Digital Corpus of Old Marathi), currently under construction, aims to offer a searchable corpus of Marathi texts from the Early (1280–1500 CE) and the Middle (1500- 1750 CE) periods of Marathi. Marathi is one of the few New Indo-Aryan languages with a continuous and rich literary record and the availability of the Old Marathi corpus will allow us to answer questions about Indo-Aryan diachrony (e.g. development of case marking patterns, voice alternations, tense/aspect systems, complex predication etc) in a much richer way than before.

Another newly initiated project is the Digital Corpus of Middle Indo-Aryan. I am currently getting some key texts from the period (Vasudevahindi of Sanghadasagani and Kuvalayamala of Uddyotanasuri) into machine readable forms using the Harvard-Kyoto convention as well as working on building an electronically accessible dictionary. The next stage will involve the building of a morphological analyzer for Prakrit. At the third stage, the corpus will be extended to Apabhramsa texts and grammatical analysis.