Clippers Today: David King on lexical paraphrasing

Automatic paraphrasing with lexical substitution

Generating automatic paraphrases with lexical substitution is a difficult task, but can be useful to supplement data in domain specific machine learning tasks. The Virtual Patient Project is an exact example of this problem, where have limited domain specific training data but need to accurately identify a user’s intended question, an example of which we may have only seen once. In this talk, I will present the progress Amad Hussein, Michael White, and I have made in automatically generating paraphrases, using unsupervised lexical substitution with WordNet, word embeddings, and the Paraphrase Database. Although currently our oracle accuracy in automatically classifying question types is only moderately above our baseline, they are modestly significant and give an estimate of what can be accomplished with human filtering. We propose future work in this direction that utilizes machine translation and phrase level substitution.