This Tuesday, Denis Newman-Griffis will be presenting on learning embeddings for ontology concepts:
Recent work on embedding ontology concepts has relied on either expensive manual annotation or automated concept tagging methods that ignore the textual contexts around concepts. We propose a novel method for jointly learning concept, phrase, and word embeddings from an unlabeled text corpus, by using the representative phrases for ontology concepts as distant supervision. We learn embeddings for medical concepts in the Unified Medical Language System and general-domain concepts in YAGO, using a variety of corpora. Our embeddings show performance competitive with existing methods on concept similarity and relatedness tasks, while requiring no human corpus annotation and demonstrating more than 3x coverage in the vocabulary size.
I’ll also be talking a bit about trying to build an analogy completion dataset for the biomedical domain.