Do languages differ in semantic transparency of derived words? Using word vectors to explore English and Russian
This study explores whether the semantic relationship of derived words to their bases is similarly sensitive to word frequency in English and Russian. High-frequency derived words are thought to be memorized by speakers, rather than being parsed into constituents. As a result, such words may become semantically opaque, implying that frequent words have lower average transparency. We investigated whether distributional differences of English and Russian derivational suffixes translate into differences in semantic transparency, using cosine similarity of word vectors. Our results show a positive correlation between derived word frequency and semantic transparency, contrary to expectations. This may reflect suffix-specific effects.