LLM (Large Language Model)

A Large Language Model is a neural network trained on huge volumes of text to predict the next token, which produces emergent capabilities like reasoning, code generation, and translation.

A Large Language Model is a neural network — usually a transformer — that learned to predict the next word (or "token") in a sequence by training on essentially the entire public internet plus large book and code corpora.

That single objective — "what comes next" — turns out to be one of the most powerful learning signals ever discovered. To predict the next word in a math proof, you have to learn math. To predict the next line of code, you have to learn programming. To predict the next sentence in a polite reply, you have to learn social context. When you scale the model + the data far enough, capabilities emerge that nobody explicitly trained for: reasoning, translation, summarization, role-play, planning.

Today's frontier LLMs (Claude 4 Sonnet/Opus, GPT-4o, Gemini 2 Pro, Llama 3.1 405B) are general-purpose enough to be the substrate underneath almost every consumer AI product you use. They're sold in three shapes:

Hosted API — pay-per-token access (Anthropic, OpenAI, Google). Easiest, fastest to update, no infra.
Open weights — download and self-host (Llama, Mistral, Qwen). Cheaper at scale, full privacy, you handle ops.
Embedded — runs locally on device (Apple Intelligence, smaller Llama variants). Free, private, capability-limited.

The economics: a token costs roughly $0.50-$15 per million for frontier models, with prices halving every 6-12 months. The capability gap between "best" and "good enough" has narrowed dramatically — many production systems today use mid-tier models because the smartest one isn't worth 10x the cost.

Real-world examples

Claude (Anthropic) — best for long-context reasoning and writing
GPT-4o / o1 (OpenAI) — strong general-purpose, best multimodal
Gemini 2 (Google) — strong multimodal, huge context window
Llama (Meta) — open-weights leader, free to self-host

Related on ToolMango

ChatGPT

OpenAI's flagship conversational AI — research, writing, code.

Claude

Anthropic's long-context AI for serious writing and reasoning.

FAQ

What's the difference between an LLM and AI?

AI is the umbrella term for any computer system that mimics intelligent behavior. LLMs are one specific type — neural networks trained on text. The current AI boom is mostly LLM-driven, but image generators, robotics, and recommenders are also AI without being LLMs.

Are open-source LLMs as good as Claude or GPT?

The best open-weights models (Llama 3.1 405B, Qwen 2.5 72B) are competitive with the previous-generation frontier (~GPT-4 Turbo) but trail current frontier models by 6-12 months. The gap is closing.

Related terms

Transformer architecture — The transformer is the neural network architecture introduced in 2017 that powers every major LLM — built around the attention mechanism that lets each token weigh all other tokens.
Context window — The context window is the maximum number of tokens (text chunks) a language model can consider at once — both the prompt you send and the response it generates.
Tokenization — Tokenization is the process of breaking text into chunks (tokens) — usually sub-word pieces — that an LLM actually reads and writes.
Embeddings — An embedding is a list of numbers that represents the meaning of a piece of text, image, or audio so similar things cluster together in vector space.

Want to actually build with this?

Our Stack Builder picks the best AI tools for your specific project in under 60 seconds.

Build my stack →