Question 1

What are word embeddings?

Accepted Answer

Word embeddings are dense vector representations of words where semantically similar words are mapped to nearby points in a high-dimensional space. Unlike one-hot encoding (where each word is an isolated symbol), embeddings capture meaning: 'cat' and 'dog' have similar vectors because they appear in similar contexts.

Question 2

How does king - man + woman = queen work?

Accepted Answer

In a well-trained embedding space, the vector from 'man' to 'king' (the royalty direction) is approximately parallel to the vector from 'woman' to 'queen.' So subtracting the 'man' vector and adding the 'woman' vector effectively swaps the gender component while preserving the royalty component, landing near 'queen.'

Question 3

What is cosine similarity and why is it used?

Accepted Answer

Cosine similarity measures the angle between two vectors, ignoring their magnitude. It ranges from -1 (opposite) to 1 (identical direction). It's preferred over Euclidean distance for word embeddings because word frequency affects vector length but not direction, and meaning is encoded in direction.

Question 4

What are the limitations of word embeddings?

Accepted Answer

Static embeddings (word2vec, GloVe) assign one vector per word, so they can't handle polysemy ('bank' as riverbank vs. financial institution). They also encode societal biases present in training data. Contextual embeddings (BERT, GPT) address polysemy by generating different vectors based on surrounding context.

Word Embeddings: Meaning in Vector Space

Formula

Words as Points in Space

The Analogy Engine

Visualizing High Dimensions

From Static to Contextual

FAQ

Sources

Embed

Word Embeddings: Meaning in Vector Space

Formula

Words as Points in Space

The Analogy Engine

Visualizing High Dimensions

From Static to Contextual

FAQ

Sources

Other simulations: Linguistics & Language

Language Family Tree & Divergence Simulator

Markov Chain Text Generator

Phonetic Vowel Space Simulator

Zipf's Law Word Frequency Simulator

Embed