Vector embedding

In NLP, the idea with vector embedding (word embedding) is that words can be embedded with one-hot encoded vectors. This allows models to understand the meaning of the words, based on how they’re encoded sequentially.

As usual, we pass a word through an encoder to create a low-dimensional embedding. Because the meaning of a word depends on its context (i.e., words that appear nearby), our decoder outputs to nearby words.

We can use a self-supervised objective (such as predicting the next word/token) to learn embeddings over tokens. Models like word2vec and GloVe learn static embeddings, with one embedding for all senses. RNN/transformer based models learn contextual embeddings, where the embedding of the same word changes according to the sentence it appears in.

Some commonly used models are:

Traditional models
- word2vec
- GloVe
Transformer models
- GPT
- BERT

Distance measures

The distance between vectors in the embedding space helps us describe which words may have a similar embedding.

The L2 norm of the vector gives the Euclidean distance within the embedding space:

D (X, Y) = ∥ X - Y ∥ = i = 0 \sum d (x_{i} - y_{i})^{2}

The cosine similarity gives the cosine of the angle between embeddings. This is invariant to the magnitude.

Sim (X, Y) = cos (θ) = \frac{X \cdot Y}{∥ X ∥∥ Y ∥} = \frac{\sum _{i = 0}^{d} x _{i} y _{i}}{\sum _{i = 0}^{d} x _{i}^{2} \sum _{i = 0}^{d} y _{i}^{2}}

And in code, with PyTorch:

torch.norm(glove['cat'] - glove['dog']) # sub with your vector of choice
torch.cosine_similarity(
						glove['cat'].unsqueeze(0),
						glove['dog'].unsqueeze(0)
)

jszhn

Recent Notes

Accounting method

Adjugate matrix

Algorithm

Algorithmic analysis

Alma Linux

Vector embedding

Distance measures

Graph View

Backlinks