Clippers 11/7: Alyssa Allen on Natural Language Comment Generation for SQL

Natural Language Comment Generation for SQL

This work is rooted in a larger project aimed at developing a dialogue system that helps non-expert SQL users comprehend database query outputs. My portion of the project focuses on training a model that can generate line-by-line natural language comments which bridge the gap between SQL and the user’s higher-level question. Prior research in SQL explainability has largely focused on translating SQL to templated English or summarize entire SQL queries with a comment (Eleftherakis et al., 2021; Narechania et al., 2021). In our generation approach, the comments should faithfully describe the purpose of one or multiple SQL commands and leverage language from the user question, ultimately making SQL parse errors easier for novice users to identify.

Our methods include first building a hand-annotated set of examples, which are then used in few-shot prompting with Chat GPT to generate a relatively small set of seed training items. From there, we experiment with fine-tuning a model (e.g. Llama) that can generate natural language comments for any SQL query, using a knowledge distillation plus filtering and editing approach. Work presented in this talk is ongoing.