Embeddings — FDE@ProdAI Blog

Definition

Embeddings are dense numerical vectors that represent tokens (or sentences, documents, images) in a continuous high-dimensional space. They encode semantic meaning so that similar concepts are geometrically close to each other.

Why Embeddings?

Computers can't process raw text — they need numbers
One-hot encoding (10,000-dim sparse vector) is inefficient and captures no meaning
Embeddings are compact (256–4096 dims) and encode rich semantic relationships

How Token Embeddings Work

1. Each token ID maps to a row in an embedding matrix (shape: vocab_size × embed_dim)

2. At inference, the model does a simple lookup: token_id → embedding vector

3. This embedding matrix is learned during pre-training

4. The same matrix is often used (transposed) at the output layer to predict the next token (weight tying)

Dimensions (Typical Values by Model Size)

| Model Size | Embedding Dimension |

|------------|-------------------|

| Small (125M) | 768 |

| Medium (1.3B) | 2048 |

| Large (7B) | 4096 |

| XL (70B+) | 8192 |

Properties of Good Embeddings

Semantic similarity: "king" and "queen" are close; "king" and "car" are far
Arithmetic: king - man + woman ≈ queen (Word2Vec famous example)
Contextual vs. static:

- Static (Word2Vec, GloVe): one fixed vector per word regardless of context

- Contextual (BERT, GPT): each occurrence gets a different vector based on surrounding tokens

Types of Embeddings

| Type | Description | Use Case |

|------|-------------|----------|

| Token embeddings | Per-token lookup vectors | Input to every transformer |

| Positional embeddings | Encode position in sequence | Combined with token embeddings |

| Sentence embeddings | Single vector for entire sentence | Semantic search, RAG |

| Image embeddings | Encode visual content | Multimodal models |

| Document embeddings | Encode entire documents | Long-doc retrieval |

Positional Embeddings

Transformers have no inherent notion of order — all tokens are processed in parallel. Positional embeddings add position information:

Absolute (sinusoidal) — original Transformer, fixed sine/cosine patterns
Learned absolute — trained position vectors (BERT, GPT-2)
Relative (RoPE) — Rotary Position Embedding, encodes relative distance; used by LLaMA, Mistral, GPT-NeoX
ALiBi — adds a linear bias to attention based on distance; good for length generalization

Embedding Space (Latent Space)

The full embedding space is also called latent space
High-dimensional geometry encodes language structure
Nearest-neighbor search in this space = semantic search
Dimensionality reduction (t-SNE, UMAP) used to visualize clusters

Practical Uses of Embeddings

Semantic search: embed query + documents → cosine similarity → retrieve relevant docs
RAG (Retrieval-Augmented Generation): store doc embeddings in vector DB, retrieve at query time
Clustering: group similar documents
Classification: feed embedding into a classifier head
Anomaly detection: find outliers in embedding space

Similarity Metrics

| Metric | Formula | Notes |

|--------|---------|-------|

| Cosine similarity | cos(θ) = A·B / (|A||B|) | Most common, direction-based |

| Dot product | A·B | Fast, used in attention |

| Euclidean distance | √Σ(ai-bi)² | Magnitude-sensitive |

Popular Embedding Models

text-embedding-3-large (OpenAI) — 3072 dims
amazon.titan-embed-text-v2 (AWS Bedrock)
all-MiniLM-L6-v2 (HuggingFace/SentenceTransformers) — lightweight
nomic-embed-text — open source, long context

Related Concepts

Token, Tokenization, Latent Space, Attention, RAG, Vector Database