Creating an Automated Museum Assistant: Building low-resource document-grounded conversational agents
This week in Clippers, Ash and I would like to discuss our work in constructing a conversational assistant for the COSI Science Museum. Where our previous system consisted of a non-conversational query classifier which responded with canned answers, we seek to create a pipeline which conditions a generative response on retrieved facts/documents and conversational history with minimal risk of toxic output. Our work is on two fronts, the construction of a retrieval system and the training of a generative LLM. For our retrieval system we investigate how to best contextualize a query within a conversation, and how to best represent documents such that retrieval is possible. For the generative LLM, we fine tune t5 and Llama and evaluate their responses using automated metrics, including GPT-4, to see which metrics and model are most effective. These fronts have an added low-resource challenge as much of our data and annotations are synthetically generated.