LLM (Large Language Model)

A Large Language Model is a neural network trained on huge volumes of text to predict the next token, which produces emergent capabilities like reasoning, code generation, and translation.

A Large Language Model is a neural network โ€” usually a transformer โ€” that learned to predict the next word (or "token") in a sequence by training on essentially the entire public internet plus large book and code corpora.

That single objective โ€” "what comes next" โ€” turns out to be one of the most powerful learning signals ever discovered. To predict the next word in a math proof, you have to learn math. To predict the next line of code, you have to learn programming. To predict the next sentence in a polite reply, you have to learn social context. When you scale the model + the data far enough, capabilities emerge that nobody explicitly trained for: reasoning, translation, summarization, role-play, planning.

Today's frontier LLMs (Claude 4 Sonnet/Opus, GPT-4o, Gemini 2 Pro, Llama 3.1 405B) are general-purpose enough to be the substrate underneath almost every consumer AI product you use. They're sold in three shapes:

  • Hosted API โ€” pay-per-token access (Anthropic, OpenAI, Google). Easiest, fastest to update, no infra.
  • Open weights โ€” download and self-host (Llama, Mistral, Qwen). Cheaper at scale, full privacy, you handle ops.
  • Embedded โ€” runs locally on device (Apple Intelligence, smaller Llama variants). Free, private, capability-limited.

The economics: a token costs roughly $0.50-$15 per million for frontier models, with prices halving every 6-12 months. The capability gap between "best" and "good enough" has narrowed dramatically โ€” many production systems today use mid-tier models because the smartest one isn't worth 10x the cost.

Real-world examples

Related on ToolMango

FAQ

What's the difference between an LLM and AI?

AI is the umbrella term for any computer system that mimics intelligent behavior. LLMs are one specific type โ€” neural networks trained on text. The current AI boom is mostly LLM-driven, but image generators, robotics, and recommenders are also AI without being LLMs.

Are open-source LLMs as good as Claude or GPT?

The best open-weights models (Llama 3.1 405B, Qwen 2.5 72B) are competitive with the previous-generation frontier (~GPT-4 Turbo) but trail current frontier models by 6-12 months. The gap is closing.

Related terms

Want to actually build with this?

Our Stack Builder picks the best AI tools for your specific project in under 60 seconds.

Build my stack โ†’