In Clippers on Tuesday, I’m going to present on the beginning stages of a new project. I’m attempting to design a response generation model for the COSI museum avatar — a virtual question-answering guide at the Language Pod that can answer questions about the pod, linguistics, and other exhibits at COSI. Currently, the avatar, which is modeled after the Virtual Patient project, returns “canned” responses to questions, meaning that it has prescribed, static answers for a set of in-domain questions to which it tries to match user inputs. This can result in a fairly unnatural conversation; if the avatar interprets two utterances as the same question, it will repeat the exact same answer. The goal of my current project is to migrate to using a response generation model that will be more contextually aware and answer questions dynamically, but also adapt to constant changes in content as exhibits in the museum change. To do so, I’m attempting to leverage the capabilities of OpenAI’s ChatGPT to generate training data for a smaller model that will hopefully avoid the pitfalls of LLMs such as toxic behavior. The plan is to eventually train a document-grounded generation model that responds directly to user inputs rather than needing to first map them to prescribed questions. This project is in the early exploratory phases, so I’m hoping to get lots of feedback on design choices and suggestions for other avenues to explore.
Month: February 2023
Clippers 2/14: Shuaichen Chang on Selective Demonstration for Text-to-SQL
Abstract: Large language models (LLMs) have shown a strong generalization capability in the cross-domain text-to-SQL task without using in-domain examples. However, with a few in-domain annotations as demonstration examples, LLMs’ performance can be further improved. In this work, we first investigate the crucial elements of in-domain examples. Based on our findings, we propose to create demonstration examples with minimal in-domain annotation to improve the generalization ability of LLMs.
Clippers 2/7: Byung-Doh Oh on decomposing autoregressive LM hidden states
While there is much recent interest in studying why Transformer-based large language models make predictions the way they do, the complex computations performed within each layer has traditionally posed a strong bottleneck. To mitigate this shortcoming, this work presents a linear decomposition of final hidden states from autoregressive language models based on each initial input token, which is exact if the activation function is piecewise linear. This decomposition allows the definition of probability distributions that ablate the contribution of input tokens, which can be used to analyze their influence on model probabilities over a sequence of upcoming words with only one forward pass from the model. Using the change in next-word probabilities as a measure of importance, this work examines which context words make the biggest contribution to language model predictions. Regression experiments suggest that Transformer-based language models rely primarily on collocational associations, followed by linguistic factors such as syntactic dependencies and coreference relationships in making next-word predictions. Additionally, analyses using these measures to predict syntactic dependencies and coreferent mention spans show that collocational association and repetitions of the same token largely explain the language model’s predictions on the respective tasks.