April | 2023

Multi-Source Morphological Reinflection with Reinforcement Learning

This project develops a task using reinforcement learning to guild multi-source morphological reinflection (MRI). MRI is the task of transforming words from one inflectional form to another. For example, when encountering a new inflected form of a word, humans may rely on their knowledge of the morphological rules of the language, as well as their experience with similar forms in the past, to infer the correct inflection. In Kann and coauthors’ (2017) study, they develop a multi-source MRI model, which receives a target tag and multiple pairs of source form and source tag for a lemma. Their model is found to out-perform single-source reinflection models as different source forms can provide complementary information. Although Kann does not provide specific details on how the multiple pairs of source form and tag are chosen, selecting appropriate source form-tag pair as reference words are the key in modeling morphological reinflection. Our project use reinforcement learning to select reference words during morphological reinflection process, specifically, an RL agent could learn to select the appropriate source form and tag pair based on the context of the lemma and the morphological features, as well as its experience with similar examples in the past, which is similar to the way humans select the appropriate inflected form based on context and their past experience with the language. Since this project is still ongoing, I would greatly appreciate any suggestions or feedback.

This work is rooted in a larger project aimed at developing a dialogue system that helps non-expert SQL users comprehend database query outputs. Prior research in SQL comment-generation has focused on comments which summarize entire SQL queries and translations of SQL to templated English (Eleftherakis et al., 2021; Narechania et al., 2021). These approaches can be helpful in comprehending SQL but are limited in their ability to guide users through the query steps and connect formal notation with intuitive concepts. To address this limitation, the project aims to generate line-by-line comments that leverage language from user questions, connecting formal SQL notation with user-friendly concepts (e.g. “tallest” or “alphabetical order”).

Due to a lack of pre-existing training data, 100 SQL queries from the SPIDER dataset (Yu et al., 2018) have been manually annotated. These 100 examples will then be used as a base for generating a more robust training set through self-training and prompting. I have been experimenting with using ChatGPT to generate comments for more queries as well as fine-tuning BART for the task. This approach will allow us to scale the training set quickly and minimize time spent writing comments by hand. This presentation will discuss the annotation process and preliminary results for comment generation using the above methods.

Ohio State nav bar

Month: April 2023

Clippers 4/18: Jingyi Chen on Multi-Source Morphological Reinflection with Reinforcement Learning

Clippers 4/11: Alyssa Allen on Line-by-Line Comment Generation for SQL