In Clippers on Tuesday, February 6th, I will be presenting the results of a user study we (Lingbo Mo, Huan Sun, Mike White, and myself) conducted in order to test the viability of an interactive semantic parsing system we built. The system was designed to help users query a knowledge base in natural language, offsetting the need to know the query language that the knowledge base uses and thus making the information more accessible to novice users. Our system decomposes the query into pieces and translates them into understandable natural language, so that users can see exactly how the system reached an answer and therefore be confident in it. Alternatively, if the parse is incorrect, the user can utilize a natural language interface to correct it.
This work was conducted in the “pre-LLM era” and thus much of the technical contribution is a bit outdated. However, the user study, in which we had crowdworkers test several versions of the system, has broad application to human evaluation of dialogue systems. As dialogue systems become increasingly ubiquitous, we believe our experience conducting this user study has important lessons to contribute to evaluation methodologies.
My goal for Clippers is to make clearer the “story” for a paper about evaluation – this project has spanned many years and there is a great deal of content to sift through. I hope to get fresh eyes on that content and get feedback on the most salient pieces.