At Clippers Tuesday, Denis Newman-Griffis will be presenting his work looking at the topological structure of word embeddings and how that info can (or can’t) be used downstream.
Word embeddings are now one of the most common tools in the NLP toolbox, and we have a good sense of how to train them, tune them, and apply them effectively. However, the structure of how they encode the information used in downstream applications is much less well-understood. In this talk, I present work analyzing nearest neighborhood topological structures derived from trained word embeddings, discarding absolute feature values and maintaining only the relative organization of points. These structures exhibit several interesting properties, including high variance in the organization of neighborhood graphs derived from embeddings trained on the same corpus with different random initializations. Additionally, I show that graph node embeddings trained over the nearest neighbor graph can be substituted for the original word embeddings in both deep and shallow downstream models for named entity recognition and paraphrase detection, with only a small loss to accuracy and even an increase in recall in some cases. While these graph node embeddings suffer from the same issue of high variance due to random initializations, they exhibit some interesting properties of their own, including generating a higher density point space, remarkably poor performance on analogy tasks, and preservation of similarity at the expense of relatedness.