Discovery of Semantic Factors in Virtual Patient Dialogues
The NLP community has become fixated on very deep Transformer models for semantic classification tasks, but some research suggests these models are not well suited to tasks with a large label space or data scarcity issues, and their speed at inference time is still unacceptable for real-time uses such as dialogue systems. We adapt a simple one-layer recurrent model utilizing a multi-headed self-attention mechanism for a dialogue task with hundreds of labels in a long-tail distribution over a few thousand examples. We demonstrate significant improvements over a strong text CNN baseline on rare labels, by independently forcing the representations of each attention head through low-dimensional bottlenecks. This requires the model to learn efficient representations, thus discovering factors of the (syntacto-)semantics of the input space that generalize from frequent labels to rare labels. The resulting models lend themselves well to interpretation, and analysis shows clear clustering of representations that span labels in ways that align with human understanding of the semantics of the inputs.