Definition
A base model (also called a foundation model or pretrained model) is an LLM that has completed pre-training on massive text corpora but has NOT yet undergone instruction tuning or alignment. It is a general-purpose text completion engine — it predicts the next token, but does not reliably follow instructions or behave like an assistant.
Behavior of a Base Model
Given a prompt, a base model will continue the text in whatever way is most statistically likely based on training data:
| Prompt | Base Model Response |
|--------|-------------------|
| "What is the capital of France?" | "What is the capital of France? What is the capital of Germany? What is the capital of..." (continues the quiz pattern) |
| "Write a function to sort a list:" | Likely produces code (seen this pattern) |
| "You are a helpful assistant." | May continue: "You are a helpful assistant. You are available 24/7..." |
It does NOT understand that the user wants an answer — it just completes the pattern.
Why Base Models Matter
- They are the starting point for all downstream models
- The quality of the base model determines the ceiling of all fine-tuned variants
- Better pre-training data quality → better reasoning, fewer hallucinations in derivatives
- Base models can be fine-tuned for many different downstream tasks from a single foundation
- Has extensive world knowledge encoded in parameters
- Can perform few-shot tasks when given examples in context
- Demonstrates emergent reasoning at sufficient scale
- Handles code, math, multiple languages
- Forms the backbone of instruct/chat models
- HuggingFace Model Hub (
meta-llama/Meta-Llama-3-8B) - AWS Bedrock (some base models available)
- Replicate, Together AI (hosted inference)
- Direct download + run with Ollama, llama.cpp, vLLM
- Pre-training, Instruct Model, Fine-Tuning, RLHF, Parameters, Foundation Model
Base Model vs. Instruct Model
| Aspect | Base Model | Instruct Model |
|--------|-----------|----------------|
| Training | Pre-training only | Pre-training + SFT + RLHF |
| Behavior | Text completion | Instruction following |
| Use case | Fine-tuning starting point | Direct user interaction |
| Reliability | Unpredictable for tasks | Task-oriented, helpful |
| Safety | No safety training | Has refusal behaviors |
Capabilities of a Strong Base Model
Despite not following instructions, a strong base model:
Few-Shot Prompting with Base Models
Base models respond well to in-context demonstrations:
`
Q: What is 2+2?
A: 4
Q: What is 10+15?
A: 25
Q: What is 7+8?
A: ← base model will complete with "15"
`
This is the foundation of few-shot learning.
Examples of Base Models
| Model | Organization | Notes |
|-------|-------------|-------|
| GPT-3 (davinci) | OpenAI | Classic base model |
| LLaMA 3 (base) | Meta | Open weights, widely used |
| Mistral 7B (base) | Mistral | Efficient, strong base |
| Falcon (base) | TII | Open base model |
| Gemma (base) | Google | Lightweight base |
Use Cases for Base Models (Practitioner View)
1. Starting point for fine-tuning: domain-specific fine-tuning (medical, legal, code)
2. Research: studying emergent capabilities, scaling laws, mechanistic interpretability
3. Custom alignment: organizations that want to apply their own RLHF pipeline
4. Evaluation: benchmarking pre-training quality before alignment