Knowledge Base

Deep technical articles on AI/ML — free, always. Build understanding from first principles.

LLM

LLM (Large Language Model)

A Large Language Model is a deep learning model trained on massive text corpora that generates text by predicting the most probable next token given a

beginner·3 min read
LLM

Token

A token is the atomic unit of text that an LLM processes. It is a piece of text — such as a word, sub-word, character, or punctuation symbol — that th

beginner·3 min read
LLM

Tokenization

Tokenization is the process of converting raw text into a sequence of tokens — the discrete numeric IDs that an LLM can process. It is the first and l

beginner·3 min read
LLM

Embeddings

Embeddings are dense numerical vectors that represent tokens (or sentences, documents, images) in a continuous high-dimensional space. They encode sem

beginner·3 min read
LLM

Latent Space

Latent space is the high-dimensional mathematical space in which embeddings are organized. It is a learned, continuous representation space where the

beginner·4 min read
LLM

Parameters

Parameters are the internal numerical variables of a neural network that are learned during training. They store the model's "knowledge" — all the pat

beginner·3 min read
LLM

Pre-training

Pre-training is the initial, large-scale training phase where a model learns general language understanding and generation capabilities by training on

intermediate·3 min read
LLM

Base Model

A base model (also called a foundation model or pretrained model) is an LLM that has completed pre-training on massive text corpora but has NOT yet un

intermediate·3 min read
LLM

Instruct Model

An instruct model is a base model that has been further trained (via Supervised Fine-Tuning and/or RLHF) to understand and follow natural language ins

intermediate·4 min read
LLM

Fine-Tuning

Fine-tuning is the process of continuing to train a pre-trained model on a smaller, task-specific or domain-specific dataset to adapt its behavior. It

intermediate·4 min read
LLM

Alignment

Alignment is the process of ensuring that an LLM's behavior is helpful, honest, and harmless — that it acts in accordance with human values and intent

intermediate·4 min read
LLM

RLHF (Reinforcement Learning from Human Feedback)

RLHF is a training technique that uses human preference judgments to guide LLM behavior. Instead of telling the model what the "correct" answer is (su

intermediate·4 min read
LLM

Prompt

A prompt is the complete input sent to an LLM — everything the model receives before it begins generating output. It includes the user's question or i

beginner·3 min read
LLM

System Prompt

A system prompt is a special, high-priority block of instructions provided to an LLM at the start of a conversation that defines the model's persona,

beginner·4 min read
LLM

User Prompt

A user prompt is the specific question, instruction, or input provided by the end user to the LLM in a conversation turn. It is the runtime input that

beginner·4 min read
LLM

Context Window

The context window is the maximum number of tokens an LLM can process in a single forward pass — the total span of text the model can "see" at once, i

intermediate·4 min read
LLM

Zero-Shot

Zero-shot prompting is the technique of asking an LLM to perform a task without providing any examples of the desired input-output behavior in the pro

beginner·3 min read
LLM

Few-Shot

Few-shot prompting is the technique of providing a small number of input-output examples (demonstrations) within the prompt to guide the LLM's behavio

beginner·4 min read
LLM

Chain of Thought (CoT)

Chain of Thought (CoT) prompting is a technique that encourages an LLM to produce intermediate reasoning steps before arriving at a final answer. By "

beginner·4 min read
LLM

Inference

Inference is the process of generating output tokens from a trained LLM. It is the "prediction" phase — using the frozen, trained model weights to pro

intermediate·5 min read
LLM

Latency

Latency is the time elapsed between submitting a prompt to an LLM and receiving the output. It encompasses network transmission, server-side queuing,

intermediate·5 min read
LLM

Temperature

Temperature is a hyperparameter that controls the randomness (or "creativity") of an LLM's output during token sampling. It scales the model's raw out

intermediate·3 min read
LLM

Hallucination

Hallucination is the phenomenon where an LLM confidently generates information that is factually incorrect, fabricated, or not supported by its traini

intermediate·5 min read
LLM

Grounding

Grounding is the practice of constraining an LLM's outputs to provided, verifiable information — "grounding" the model's responses in a factual founda

intermediate·4 min read
LLM

RAG (Retrieval-Augmented Generation)

RAG (Retrieval-Augmented Generation) is an architecture pattern that combines information retrieval with LLM generation. Instead of relying solely on

advanced·5 min read
LLM

Workflow

In the context of LLMs, a workflow is a fixed, predefined sequence of steps where an LLM (and potentially other tools) follow a predetermined path to

advanced·4 min read
LLM

Agent (LLM Agent)

An LLM agent is a system where an LLM acts as the reasoning engine (the "brain") that dynamically plans actions, selects tools to use, executes those

advanced·5 min read
LLM

Multimodality

Multimodality in LLMs refers to the ability of a model to process, understand, and generate content across multiple types of data modalities — not jus

advanced·5 min read
LLM

Benchmarks

Benchmarks are standardized test datasets and evaluation protocols used to measure, compare, and track the capabilities of LLMs across specific tasks

advanced·5 min read
LLM

Guardrails

Guardrails are safety and control mechanisms — typically applied at the application layer, around an LLM — that detect, block, or filter unsafe, inapp

intermediate·5 min read
LLM

Transformer

The Transformer is the neural network architecture that underlies all modern LLMs. Introduced in the paper "Attention Is All You Need" (Vaswani et al.

intermediate·4 min read
LLM

Attention / Self-Attention

Attention is the core mechanism of Transformers that allows each token to dynamically focus on (or "attend to") other tokens in the sequence based on

intermediate·5 min read
LLM

Scaling Laws

Scaling laws are empirical relationships that describe how LLM performance (measured by loss) improves predictably and smoothly as a function of three

intermediate·5 min read
LLM

Logits and Softmax

**Logits** are the raw, unnormalized output scores the LLM produces for every token in its vocabulary at each generation step. **Softmax** is the math

intermediate·4 min read
LLM

Top-P and Top-K Sampling

**Top-K** and **Top-P (nucleus sampling)** are token filtering strategies applied after temperature scaling that restrict which tokens can be sampled

intermediate·4 min read
LLM

LoRA and PEFT (Parameter-Efficient Fine-Tuning)

**PEFT (Parameter-Efficient Fine-Tuning)** is a family of techniques that fine-tune only a tiny fraction of a model's parameters instead of updating a

intermediate·5 min read
LLM

Quantization

Quantization is the process of reducing the numerical precision of a model's weights (and sometimes activations) from higher-bit formats (float32, flo

intermediate·4 min read
LLM

Tool Use / Function Calling

Tool use (also called function calling) is the LLM's ability to output a structured request to invoke an external function, API, or capability — and t

advanced·5 min read
LLM

Structured Output

Structured output is the ability to constrain an LLM to produce output that conforms to a specific format or schema — most commonly JSON, but also XML

advanced·4 min read
LLM

Prompt Injection

Prompt injection is a security attack where malicious content embedded in user input or external data overrides or manipulates the LLM's instructions,

intermediate·5 min read
LLM

Reasoning Models / Extended Thinking

Reasoning models are a class of LLMs that perform extended internal chain-of-thought before producing a final answer — trading increased inference com

intermediate·5 min read
LLM

KV Cache (Key-Value Cache)

The KV Cache (Key-Value Cache) is an optimization that stores the computed attention Key and Value matrices for all previously processed tokens during

intermediate·4 min read
LLM

DPO (Direct Preference Optimization)

DPO (Direct Preference Optimization) is a simpler alternative to RLHF for aligning LLMs with human preferences. It directly fine-tunes the model on (c

intermediate·4 min read
LLM

Mixture of Experts (MoE)

Mixture of Experts (MoE) is a neural network architecture where only a subset of the model's parameters are activated for each input token. Instead of

intermediate·4 min read
LLM

Evals (Evaluation Frameworks)

Evals (evaluations) are systematic testing frameworks that measure how well an LLM performs on a specific task, use case, or quality dimension. Unlike

advanced·5 min read
LLM

Emergent Abilities

Emergent abilities are capabilities that appear in LLMs at sufficient scale — they are absent or near-random in smaller models, then appear sharply an

advanced·4 min read
LLM

MCP (Model Context Protocol)

MCP (Model Context Protocol) is an open protocol developed by Anthropic that standardizes how LLMs connect to external tools, data sources, and servic

advanced·4 min read
LLM

Streaming (Token Streaming)

Streaming is the practice of delivering LLM output tokens to the user incrementally as they are generated, rather than waiting for the complete respon

intermediate·4 min read
LLM

LLM-as-Judge

LLM-as-Judge is an evaluation technique where a language model is used to assess the quality of another language model's outputs — acting as an automa

advanced·5 min read
LLM

Prompt Caching

Prompt caching is an optimization where the LLM provider precomputes and stores the KV (key-value) cache for a repeated portion of the prompt — typica

advanced·4 min read
LLM

In-Context Learning (ICL)

In-Context Learning (ICL) is the emergent ability of LLMs to learn a new task or adapt to new patterns by reading examples provided in the prompt — wi

beginner·4 min read
LLM

Next-Token Prediction

Next-token prediction (also called autoregressive language modeling or causal language modeling) is the core training objective of most modern LLMs: g

beginner·4 min read

Want structured learning with live instruction?

Our programs build on this content with mentorship, hands-on projects, and a peer cohort.

Explore Programs