Words as Points in Space
In 2013, Tomas Mikolov's word2vec paper demonstrated something that felt almost magical: if you train a neural network to predict words from their neighbors in a large text corpus, the internal representations it learns encode meaning as geometry. Words that mean similar things end up near each other. And the directions between words capture abstract relationships — gender, tense, plurality, even geography.
The Analogy Engine
The most famous property of word embeddings is analogy by vector arithmetic. The relationship 'man is to king as woman is to ___' becomes a geometric operation: subtract the man vector, add the woman vector, and find the nearest word. The answer — queen — demonstrates that abstract semantic relationships are encoded as parallel vector offsets throughout the space.
Visualizing High Dimensions
Real word embeddings live in 100–300 dimensional spaces that humans cannot visualize directly. This simulation uses PCA (Principal Component Analysis) to project the high-dimensional vectors down to 2D while preserving as much structure as possible. You can see semantic clusters form — animals group together, professions cluster, emotions neighbor each other — even in this lossy projection.
From Static to Contextual
Static embeddings like word2vec give each word exactly one vector, regardless of context. But 'bank' means different things in 'river bank' and 'bank account.' Modern contextual embeddings (BERT, GPT) solve this by generating different vectors for the same word depending on its surrounding sentence — the same principle, but with context-dependent geometry that captures the full richness of polysemy.