Amad Hussain and Henry Leonardi
Abstract: Low-resource dialogue systems often contain a high degree of few-shot class labels, leading to challenges in utterance classification performance. A potential solution is data augmentation through paraphrase generation, but this method has the potential to introduce harmful data points in form of low quality paraphrases. We explore this challenge as a case-study using a virtual patient dialogue system, which contains a long-tail distribution of few-shot labels. We investigate the efficacy of paraphrase augmentation through Neural Example Extrapolation (Ex2) using both in-domain and out-of-domain data, as well as the effects of paraphrase validation techniques using Natural Language Inference (NLI) and reconstruction methods. These data augmentation techniques are validated through training and evaluation of a downstream self-attentive RNN model with and without MIXUP. Initial results indicate paraphrase augmentation improves downstream model performance, however with less benefit than augmenting with MIXUP. Furthermore, we show mixed results for paraphrase augmentation in combination with MIXUP as well as for the efficacy of paraphrase validation. These results indicate a trade-off between reduction of misleading paraphrases and paraphrase diversity. In accordance with these initial findings, we identify promising areas of future work that have the potential to address this trade-off and better leverage paraphrase augmentation, especially in coordination with Mix-Up. As this is a work in progress, we hope to have a productive conversation with regards to the feasibility of our future directions as well any larger limitations or directions we should consider.