Word Embeddings: Meaning in Vector Space

simulator intermediate ~10 min
Loading simulation...
king - man + woman ≈ queen — meaning as geometry

Word embeddings map words to high-dimensional vectors where semantic relationships become geometric operations. The famous analogy king - man + woman ≈ queen works because gender is encoded as a consistent direction in the vector space.

Formula

Cosine similarity: cos(A, B) = (A · B) / (|A| * |B|)
Word analogy: v(queen) ≈ v(king) - v(man) + v(woman)
PCA: project high-d vectors onto top-2 eigenvectors of covariance matrix

Words as Points in Space

In 2013, Tomas Mikolov's word2vec paper demonstrated something that felt almost magical: if you train a neural network to predict words from their neighbors in a large text corpus, the internal representations it learns encode meaning as geometry. Words that mean similar things end up near each other. And the directions between words capture abstract relationships — gender, tense, plurality, even geography.

The Analogy Engine

The most famous property of word embeddings is analogy by vector arithmetic. The relationship 'man is to king as woman is to ___' becomes a geometric operation: subtract the man vector, add the woman vector, and find the nearest word. The answer — queen — demonstrates that abstract semantic relationships are encoded as parallel vector offsets throughout the space.

Visualizing High Dimensions

Real word embeddings live in 100–300 dimensional spaces that humans cannot visualize directly. This simulation uses PCA (Principal Component Analysis) to project the high-dimensional vectors down to 2D while preserving as much structure as possible. You can see semantic clusters form — animals group together, professions cluster, emotions neighbor each other — even in this lossy projection.

From Static to Contextual

Static embeddings like word2vec give each word exactly one vector, regardless of context. But 'bank' means different things in 'river bank' and 'bank account.' Modern contextual embeddings (BERT, GPT) solve this by generating different vectors for the same word depending on its surrounding sentence — the same principle, but with context-dependent geometry that captures the full richness of polysemy.

FAQ

What are word embeddings?

Word embeddings are dense vector representations of words where semantically similar words are mapped to nearby points in a high-dimensional space. Unlike one-hot encoding (where each word is an isolated symbol), embeddings capture meaning: 'cat' and 'dog' have similar vectors because they appear in similar contexts.

How does king - man + woman = queen work?

In a well-trained embedding space, the vector from 'man' to 'king' (the royalty direction) is approximately parallel to the vector from 'woman' to 'queen.' So subtracting the 'man' vector and adding the 'woman' vector effectively swaps the gender component while preserving the royalty component, landing near 'queen.'

What is cosine similarity and why is it used?

Cosine similarity measures the angle between two vectors, ignoring their magnitude. It ranges from -1 (opposite) to 1 (identical direction). It's preferred over Euclidean distance for word embeddings because word frequency affects vector length but not direction, and meaning is encoded in direction.

What are the limitations of word embeddings?

Static embeddings (word2vec, GloVe) assign one vector per word, so they can't handle polysemy ('bank' as riverbank vs. financial institution). They also encode societal biases present in training data. Contextual embeddings (BERT, GPT) address polysemy by generating different vectors based on surrounding context.

Sources

Embed

<iframe src="https://homo-deus.com/lab/linguistics/word-embedding/embed" width="100%" height="400" frameborder="0"></iframe>
View source on GitHub