Intermediate·3 min read

Temperature

Temperature is a hyperparameter that controls the randomness (or "creativity") of an LLM's output during token sampling. It scales the model's raw out

Definition

Temperature is a hyperparameter that controls the randomness (or "creativity") of an LLM's output during token sampling. It scales the model's raw output scores (logits) before converting them to probabilities, thereby controlling how peaked or flat the probability distribution over the vocabulary is.

The Mathematics

After the final layer, the model produces logits — raw unnormalized scores for each vocabulary token.

Temperature is applied before the softmax:

`

probability(token_i) = softmax(logits / T)[i]

= exp(logit_i / T) / Σ exp(logit_j / T)

`

Effect of Temperature

T = 0 (Greedy)

  • All probability mass on the single highest-scoring token
  • Completely deterministic — same prompt always produces the same output
  • Very focused, but can be repetitive and "safe"
  • T < 1.0 (e.g., 0.2–0.7)

  • Distribution becomes more peaked (high-prob tokens get even more probability)
  • More deterministic, consistent, conservative outputs
  • Good for: factual Q&A, code generation, structured data extraction
  • T = 1.0

  • Use the model's raw probability distribution as-is
  • Balanced creativity and coherence
  • T > 1.0 (e.g., 1.2–2.0)

  • Distribution becomes flatter (low-prob tokens become more likely)
  • More random, creative, surprising, but also more incoherent
  • Good for: creative writing, brainstorming, diverse outputs
  • High temperatures can produce gibberish or off-topic content
  • Visualization

    `

    Vocab token probabilities at different temperatures (example):

    Token: "Paris" "France" "London" "dog" "purple"

    T=0.1: 0.97 0.02 0.01 0.00 0.00

    T=0.7: 0.65 0.20 0.12 0.02 0.01

    T=1.0: 0.50 0.25 0.15 0.07 0.03

    T=1.5: 0.35 0.28 0.20 0.12 0.05

    T=2.0: 0.22 0.22 0.21 0.18 0.17

    `

    Recommended Temperature by Use Case

    | Use Case | Recommended T | Reason |

    |----------|--------------|--------|

    | Code generation | 0.0–0.2 | Deterministic, correct syntax |

    | Factual Q&A | 0.0–0.3 | Accurate, consistent facts |

    | Structured data extraction | 0.0–0.2 | Consistent format |

    | Summarization | 0.3–0.5 | Coherent, slight variation ok |

    | Conversational chat | 0.7–1.0 | Natural, varied responses |

    | Creative writing | 0.8–1.2 | Expressive, imaginative |

    | Brainstorming / ideation | 1.0–1.4 | Diverse ideas |

    | Poetry / experimental | 1.0–2.0 | Maximum creativity |

    Temperature + Other Sampling Parameters

    Temperature works alongside other sampling methods:

  • Top-P: applies after temperature scaling — samples from top cumulative P% probability
  • Top-K: also applies after temperature — restricts to top K tokens
  • Common production combination: temperature=0.7, top_p=0.9
  • For deterministic output: temperature=0 (or equivalent top_p=1, top_k=1)

    Temperature = 0 is NOT True Zero

    In practice, floating-point precision means temperature=0 is implemented as argmax (always pick the top token). Some slight non-determinism may still occur due to:

  • GPU floating-point non-associativity
  • Parallel computation ordering
  • Different hardware backends
  • Temperature in System Design

    For production applications:

  • Set temperature per use case — don't use one global temperature
  • Evaluation prompts: T=0 for reproducible eval results
  • User-facing generation: T=0.7–1.0 for natural variation
  • Multiple completion APIs (OpenAI n param): high T to get diverse options
  • Relationship to Entropy

    Temperature directly controls the entropy of the output distribution:

  • Low T → low entropy (predictable)
  • High T → high entropy (uncertain/random)
  • "Creative" == "high entropy" in information-theoretic terms
  • API Parameter Names

    | Platform | Parameter Name |

    |----------|---------------|

    | OpenAI | temperature |

    | Anthropic Claude | temperature |

    | AWS Bedrock | temperature |

    | HuggingFace | temperature |

    | Ollama | temperature |

    Range: 0.0–2.0 (OpenAI); 0.0–1.0 (Anthropic)

    Related Concepts

  • Inference, Sampling, Top-P, Top-K, Greedy Decoding, Logits, Randomness

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 6).