In this work, we explore how a real time reading tracker can be built efficiently for children’s voices. While previously proposed reading trackers focused on ASR-based cascaded approaches, we propose a fully end-to-end model making it less prone to lags in voice tracking. We employ a pointer network that directly learns to predict positions in the ground truth text conditioned on the streaming speech. To train this pointer network, we generate ground truth training signals by using forced alignment between the read speech and the text being read on the training set. Exploring different forced alignment models, we find a neural attention-based model is at least as close in alignment accuracy to the Montreal Forced Aligner, but surprisingly is a better training signal for the pointer network. Our results are reported on one adult speech data (TIMIT) and two children’s speech datasets (CMU Kids and Reading Races). Our best model can accurately track adult speech with 87.8% accuracy and the much harder and disfluent children’s speech with 77.1% accuracy on CMU Kids data and a 65.3% accuracy on the Reading Races dataset.
Safety and Consistency in dialogue systems
Safety and consistency of generated utterances from dialogue systems have been important issues for dialogue system development. A good dialogue system should be safe all the time, even when provoked by users, and consistent with the context, even when the user is not. In this talk, I am going to present our attempts at addressing some of the issues related to safety and consistency with two new datasets, new tasks and experiments. Different models, including large language models such as ChatGPT and GPT4, are used in evaluation of tasks such as safe rewriting and inconsistency resolution to look at their ability to detect and amend dialogues caused by unsafe or inconsistent responses. I will discuss how they behave and what future directions are for these problems.
Creating an Automated Museum Assistant: Building low-resource document-grounded conversational agents
This week in Clippers, Ash and I would like to discuss our work in constructing a conversational assistant for the COSI Science Museum. Where our previous system consisted of a non-conversational query classifier which responded with canned answers, we seek to create a pipeline which conditions a generative response on retrieved facts/documents and conversational history with minimal risk of toxic output. Our work is on two fronts, the construction of a retrieval system and the training of a generative LLM. For our retrieval system we investigate how to best contextualize a query within a conversation, and how to best represent documents such that retrieval is possible. For the generative LLM, we fine tune t5 and Llama and evaluate their responses using automated metrics, including GPT-4, to see which metrics and model are most effective. These fronts have an added low-resource challenge as much of our data and annotations are synthetically generated.
Natural Language Comment Generation for SQL
This work is rooted in a larger project aimed at developing a dialogue system that helps non-expert SQL users comprehend database query outputs. My portion of the project focuses on training a model that can generate line-by-line natural language comments which bridge the gap between SQL and the user’s higher-level question. Prior research in SQL explainability has largely focused on translating SQL to templated English or summarize entire SQL queries with a comment (Eleftherakis et al., 2021; Narechania et al., 2021). In our generation approach, the comments should faithfully describe the purpose of one or multiple SQL commands and leverage language from the user question, ultimately making SQL parse errors easier for novice users to identify.
Our methods include first building a hand-annotated set of examples, which are then used in few-shot prompting with Chat GPT to generate a relatively small set of seed training items. From there, we experiment with fine-tuning a model (e.g. Llama) that can generate natural language comments for any SQL query, using a knowledge distillation plus filtering and editing approach. Work presented in this talk is ongoing.