Beginner·4 min read

Chain of Thought (CoT)

Chain of Thought (CoT) prompting is a technique that encourages an LLM to produce intermediate reasoning steps before arriving at a final answer. By "

Definition

Chain of Thought (CoT) prompting is a technique that encourages an LLM to produce intermediate reasoning steps before arriving at a final answer. By "thinking out loud," the model decomposes complex problems into manageable steps, dramatically improving accuracy on tasks requiring multi-step reasoning.

The Core Discovery

Introduced in the paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" (Wei et al., Google Brain, 2022).

Without CoT:

`

Q: A store sells 10 apples at $0.50 each and 5 oranges at $0.75 each. What's the total?

A: $5.75 (often wrong without reasoning)

`

With CoT:

`

Q: A store sells 10 apples at $0.50 each and 5 oranges at $0.75 each. What's the total?

A: Let me calculate step by step.

Apples: 10 × $0.50 = $5.00

Oranges: 5 × $0.75 = $3.75

Total: $5.00 + $3.75 = $8.75

Answer: $8.75

`

Types of CoT Prompting

1. Few-Shot CoT (Original)

Provide examples that include reasoning chains:

`

Q: [Example problem]

A: [Step-by-step reasoning] → [Answer]

Q: [Example problem 2]

A: [Step-by-step reasoning] → [Answer]

Q: [New problem]

A:

`

2. Zero-Shot CoT

Simply append a trigger phrase — no examples needed:

`

Q: [Problem]

A: Let's think step by step.

`

Effective trigger phrases:

  • "Let's think step by step."
  • "Think carefully before answering."
  • "Let's work through this systematically."
  • "First, let me break this down."
  • 3. Auto-CoT

    Automatically generate reasoning chains using the model itself, then use those as few-shot examples.

    4. Self-Consistency CoT

    1. Generate N independent reasoning chains (with temperature > 0)

    2. Each chain may reach a different answer

    3. Take the majority vote answer

    4. More reliable than single-chain CoT

    5. Tree of Thoughts (ToT)

  • Extend CoT into a tree structure
  • Model explores multiple reasoning branches simultaneously
  • Backtrack from dead ends
  • Best path is selected via search/evaluation
  • When CoT Helps Most

    | Task Type | Benefit |

    |-----------|---------|

    | Multi-step arithmetic | High |

    | Symbolic reasoning | High |

    | Logic puzzles | High |

    | Commonsense reasoning | Moderate-High |

    | Code debugging | High |

    | Multi-hop QA | High |

    | Simple factual QA | Low (may hurt by adding noise) |

    | Classification | Low-Moderate |

    Mechanism: Why CoT Works

    1. Scratchpad effect: intermediate steps serve as working memory the model can reference

    2. Error decomposition: each small step is easier than the full problem

    3. Self-checking: making steps explicit creates opportunities to catch errors

    4. Attention shaping: reasoning tokens in the output condition subsequent token predictions

    Key insight: the tokens in the reasoning chain are real computation — they literally influence what the model outputs next.

    CoT and Model Scale

    CoT is an emergent ability — it only works well on sufficiently large models:

  • < 10B params: minimal benefit
  • ~10–50B params: moderate benefit
  • > 100B params: strong benefit
  • Smaller models trained with CoT data can learn this skill (distillation)
  • Modern LLMs and "Thinking" Tokens

    Frontier models (OpenAI o1, Claude 3.5+, Gemini Thinking) implement extended CoT via dedicated thinking/reasoning tokens that are generated before the final answer:

  • Hidden from the user (internal scratchpad)
  • Can be thousands of tokens of intermediate reasoning
  • Dramatically improves complex reasoning benchmarks
  • Referred to as "extended thinking" or "reasoning models"
  • CoT in Practice

    Prompt Pattern

    `

    You are a math tutor. When solving problems, always show your reasoning step by step.

    For each step, explain why you're taking that step.

    Problem: [user's problem]

    Let me work through this step by step:

    `

    With Output Parsing

    For structured output, separate the chain from the answer:

    `

    Reasoning: [full chain of thought]

    Final Answer: [just the answer]

    `

    Limitations

    | Limitation | Notes |

    |------------|-------|

    | Hallucinated reasoning | Model can produce fluent but wrong reasoning chains |

    | Cost | Reasoning tokens consume context and increase latency |

    | Not always better | Simple tasks → CoT adds noise, not signal |

    | Scale dependency | Weak on small models |

    | Faithfulness question | Reasoning may not reflect actual computation |

    Related Concepts

  • Few-Shot, Zero-Shot, Prompt, Inference, Reasoning Models, Tree of Thoughts, Self-Consistency

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 5).