Intermediate·3 min read

Base Model

A base model (also called a foundation model or pretrained model) is an LLM that has completed pre-training on massive text corpora but has NOT yet un

Definition

A base model (also called a foundation model or pretrained model) is an LLM that has completed pre-training on massive text corpora but has NOT yet undergone instruction tuning or alignment. It is a general-purpose text completion engine — it predicts the next token, but does not reliably follow instructions or behave like an assistant.

Behavior of a Base Model

Given a prompt, a base model will continue the text in whatever way is most statistically likely based on training data:

| Prompt | Base Model Response |

|--------|-------------------|

| "What is the capital of France?" | "What is the capital of France? What is the capital of Germany? What is the capital of..." (continues the quiz pattern) |

| "Write a function to sort a list:" | Likely produces code (seen this pattern) |

| "You are a helpful assistant." | May continue: "You are a helpful assistant. You are available 24/7..." |

It does NOT understand that the user wants an answer — it just completes the pattern.

Why Base Models Matter

  • They are the starting point for all downstream models
  • The quality of the base model determines the ceiling of all fine-tuned variants
  • Better pre-training data quality → better reasoning, fewer hallucinations in derivatives
  • Base models can be fine-tuned for many different downstream tasks from a single foundation
  • Base Model vs. Instruct Model

    | Aspect | Base Model | Instruct Model |

    |--------|-----------|----------------|

    | Training | Pre-training only | Pre-training + SFT + RLHF |

    | Behavior | Text completion | Instruction following |

    | Use case | Fine-tuning starting point | Direct user interaction |

    | Reliability | Unpredictable for tasks | Task-oriented, helpful |

    | Safety | No safety training | Has refusal behaviors |

    Capabilities of a Strong Base Model

    Despite not following instructions, a strong base model:

  • Has extensive world knowledge encoded in parameters
  • Can perform few-shot tasks when given examples in context
  • Demonstrates emergent reasoning at sufficient scale
  • Handles code, math, multiple languages
  • Forms the backbone of instruct/chat models
  • Few-Shot Prompting with Base Models

    Base models respond well to in-context demonstrations:

    `

    Q: What is 2+2?

    A: 4

    Q: What is 10+15?

    A: 25

    Q: What is 7+8?

    A: ← base model will complete with "15"

    `

    This is the foundation of few-shot learning.

    Examples of Base Models

    | Model | Organization | Notes |

    |-------|-------------|-------|

    | GPT-3 (davinci) | OpenAI | Classic base model |

    | LLaMA 3 (base) | Meta | Open weights, widely used |

    | Mistral 7B (base) | Mistral | Efficient, strong base |

    | Falcon (base) | TII | Open base model |

    | Gemma (base) | Google | Lightweight base |

    Use Cases for Base Models (Practitioner View)

    1. Starting point for fine-tuning: domain-specific fine-tuning (medical, legal, code)

    2. Research: studying emergent capabilities, scaling laws, mechanistic interpretability

    3. Custom alignment: organizations that want to apply their own RLHF pipeline

    4. Evaluation: benchmarking pre-training quality before alignment

    Accessing Base Models

  • HuggingFace Model Hub (meta-llama/Meta-Llama-3-8B)
  • AWS Bedrock (some base models available)
  • Replicate, Together AI (hosted inference)
  • Direct download + run with Ollama, llama.cpp, vLLM
  • Related Concepts

  • Pre-training, Instruct Model, Fine-Tuning, RLHF, Parameters, Foundation Model

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 2).