Chain-of-thought (CoT)
Chain-of-thought prompting asks a model to reason step-by-step before producing its final answer, which substantially improves accuracy on hard problems.
Chain-of-thought prompting is the simple, powerful technique of asking a model to show its work before giving a final answer. Instead of "What's 24 × 17?" you ask "What's 24 × 17? Think step by step." The model produces a chain of intermediate reasoning ("24 × 10 = 240, 24 × 7 = 168, 240 + 168 = 408") and arrives at a better final answer.
The technique was discovered in 2022 and remains one of the highest-ROI prompt patterns. Accuracy gains on math, logic, and multi-step reasoning benchmarks typically range from 10-40 percentage points just by adding "let's think step by step" to a prompt.
In 2024-2025, model labs started building chain-of-thought directly into the model itself. Reasoning models like OpenAI's o-series, Claude with extended thinking, and Gemini 2 Thinking spend extra compute "thinking" internally before responding — no special prompt needed. These models pay more in latency and tokens but achieve human-expert performance on competition math, programming, and PhD-level science questions.
Practical CoT tips:
- Don't ask for both reasoning and a structured output in the same turn. Models often skip the reasoning to satisfy the structure. Use two-stage: reason first, format second.
- For production, hide the chain. Users mostly want answers, not reasoning. Capture the chain in logs for debugging.
- For reasoning models, you can usually drop manual CoT prompting. The model does it internally. Adding "think step by step" to a reasoning model is redundant and sometimes hurts.
FAQ
Does chain-of-thought work on small models?
Less reliably. The benefits scale with model size — chain-of-thought sometimes hurts very small models because they make mistakes in the reasoning steps. It's most effective on 7B+ parameter models and dramatically so on frontier models.
Related terms
- Prompt engineering — Prompt engineering is the craft of writing instructions to a language model so it produces reliable, accurate, useful outputs.
- LLM (Large Language Model) — A Large Language Model is a neural network trained on huge volumes of text to predict the next token, which produces emergent capabilities like reasoning, code generation, and translation.
- AI agent — An AI agent is an LLM-driven system that takes actions in the world — calling tools, browsing, writing code, finishing tasks — instead of just answering questions.
Want to actually build with this?
Our Stack Builder picks the best AI tools for your specific project in under 60 seconds.
Build my stack →