Chain-of-thought (CoT)

Chain-of-thought prompting asks a model to reason step-by-step before producing its final answer, which substantially improves accuracy on hard problems.

Chain-of-thought prompting is the simple, powerful technique of asking a model to show its work before giving a final answer. Instead of "What's 24 × 17?" you ask "What's 24 × 17? Think step by step." The model produces a chain of intermediate reasoning ("24 × 10 = 240, 24 × 7 = 168, 240 + 168 = 408") and arrives at a better final answer.

The technique was discovered in 2022 and remains one of the highest-ROI prompt patterns. Accuracy gains on math, logic, and multi-step reasoning benchmarks typically range from 10-40 percentage points just by adding "let's think step by step" to a prompt.

In 2024-2025, model labs started building chain-of-thought directly into the model itself. Reasoning models like OpenAI's o-series, Claude with extended thinking, and Gemini 2 Thinking spend extra compute "thinking" internally before responding — no special prompt needed. These models pay more in latency and tokens but achieve human-expert performance on competition math, programming, and PhD-level science questions.

Practical CoT tips:

Don't ask for both reasoning and a structured output in the same turn. Models often skip the reasoning to satisfy the structure. Use two-stage: reason first, format second.
For production, hide the chain. Users mostly want answers, not reasoning. Capture the chain in logs for debugging.
For reasoning models, you can usually drop manual CoT prompting. The model does it internally. Adding "think step by step" to a reasoning model is redundant and sometimes hurts.

FAQ

Does chain-of-thought work on small models?

Less reliably. The benefits scale with model size — chain-of-thought sometimes hurts very small models because they make mistakes in the reasoning steps. It's most effective on 7B+ parameter models and dramatically so on frontier models.

Related terms

Prompt engineering — Prompt engineering is the craft of writing instructions to a language model so it produces reliable, accurate, useful outputs.
LLM (Large Language Model) — A Large Language Model is a neural network trained on huge volumes of text to predict the next token, which produces emergent capabilities like reasoning, code generation, and translation.
AI agent — An AI agent is an LLM-driven system that takes actions in the world — calling tools, browsing, writing code, finishing tasks — instead of just answering questions.

Want to actually build with this?

Our Stack Builder picks the best AI tools for your specific project in under 60 seconds.

Build my stack →