How LLMs work, and why they struggled to count the Rs in the word strawberry
TL;DR - How LLMs function
Large language models are not explicitly optimized for correctness — they are optimized to continue language with minimal uncertainty. Confidence, hesitation, and lookup emerge from how easy or hard it is for language to flow forward.
What this page is
This page provides a clear, mechanical explanation of how modern large language models behave under the hood — why they appear confident or uncertain, when they seek external data, and why these behaviors emerge from language dynamics rather than belief, intent, or understanding.
It avoids marketing language, anthropomorphic metaphors, and unnecessary theory, focusing instead on calibration. By understanding what these systems are actually optimizing for, readers can better judge when to trust an answer, when to verify it, and why fluent language alone is not evidence of correctness.
Core thesis:
Almost every high-level behavior exhibited by modern language models — confidence, hesitation, external lookup, and visible failure modes — arises from how stable or unstable the language continuation is, not from whether the model “knows” or “believes” something to be true.
What language continuation stability means
Large language models do not perform an explicit truth-evaluation step in the human sense. Instead, they continually operate under a simpler, mechanical constraint:
If generation continues from this point, does the language naturally flow in one clear direction, or does it fragment into multiple plausible but incompatible paths?
When language flows cleanly toward a single continuation, outputs appear confident and decisive. When multiple competing continuations exist, outputs become cautious, hedged, exploratory, or trigger external lookup.
This behavior is not driven by belief, intention, or epistemic judgment, but by the shape of the predictive distribution over next tokens [1, 2]. It is a direct consequence of how concentrated or diffuse the probability distribution over possible next tokens happens to be.
The four major signals of instability
Instability does not arise from explicit rules. Instead, it manifests through a small number of recurring signals that indicate elevated uncertainty in language continuation.
1. Are there many different answers that could reasonably follow?
When multiple answers, timelines, definitions, or factual variants are all viable, no single continuation dominates. Probability mass spreads, entropy increases, and continuation stability declines.
2. Do high-probability continuations already contain caveats?
Phrases such as “it depends,” “definitions vary,” or “there is no single answer” often emerge naturally when internal disagreement exists. This hedging is not a safety choice; it reflects underlying language uncertainty.
3. Does this type of information historically drift over time?
Some domains remain stable across decades, while others — leadership roles, policies, pricing, ownership — change frequently. Temporal inconsistency in training data introduces entropy before an answer is ever produced.
4. Has this prompt structure frequently failed when answered directly?
During training and evaluation, certain question patterns consistently produced confident but incorrect answers. Over time, the model learned that these patterns require decomposition, qualification, or verification.
Entropy as the unifying concept
These signals are not independent mechanisms. They are different expressions of a single underlying phenomenon: entropy in the space of possible continuations [3].
- Competing answers increase entropy
- Early hedging reflects entropy
- Temporal drift introduces entropy
- Historical penalties reshape entropy
The model does not assess truth directly. It reacts to how ordered or disordered the continuation landscape appears.
Scope, applicability, and terminology
The model described on this page is not specific to a single vendor or system. It applies broadly to modern large language models built on transformer architectures and trained via next-token prediction, as a high-level behavioral abstraction.
This includes essentially all widely deployed general-purpose LLM families, such as those developed by OpenAI, Anthropic, Google, Meta, Mistral, and other contemporary research groups. Differences between these systems affect how frequently certain behaviors appear, not the underlying mechanism that produces them.
In academic and technical literature, the same dynamics described here using “language continuation stability” are often discussed under different but closely related terms. Common equivalents include:
- Entropy of the predictive distribution
- Token-level uncertainty
- Distribution sharpness or dispersion
- Logit entropy or logit variance
- Model calibration (informal usage)
- Decoding confidence (surface-level interpretation)
These concepts appear across the literature under different names, but describe closely related properties of model uncertainty and distribution sharpness [3, 4, 5].
While these terms differ in mathematical framing or research context, they all describe the same underlying phenomenon: how concentrated or diffuse the set of plausible next tokens is at a given point in generation.
The explanation presented here intentionally avoids formal notation in favor of a mechanical, behavior-first description. This abstraction aligns with the technical reality of how language models operate while remaining accessible without requiring prior exposure to machine learning theory.
Why this is not “confidence in correctness”
Language models do not maintain an internal confidence score, belief state, or probability of correctness. What exists instead is a probability distribution over possible next tokens.
When that distribution is sharply peaked, language appears confident. When it is broad and diffuse, language appears cautious. Human readers often interpret this as epistemic confidence, but internally it is purely a matter of distribution shape.
Why external lookup restores stability
External search is not a truth validator. It is a stability-restoring operation.
Injecting authoritative text collapses competing continuations, reduces entropy, and allows generation to proceed smoothly [6]. Lookup is triggered when continuing without additional context would destabilize language flow.
The strawberry problem
A useful illustration of this dynamic came from a surprisingly simple failure. For a period of time, early versions of ChatGPT would confidently give the wrong answer when asked how many times the letter “r” appears in the word “strawberry.” The question was trivial, clearly defined, and easily verifiable — yet the model often answered incorrectly, and did so with high confidence.
The issue mattered not because the task was important, but because it exposed a specific weakness. The language flowed smoothly toward a plausible answer based on pattern familiarity rather than structural analysis. Confidence emerged from low entropy in continuation, not from correctness.
Training did not introduce symbolic counters. Instead, it reduced reliance on gestalt language flow for structure-sensitive tasks. Once stability was restored, the visible failure disappeared.
The modern risk surface
Contemporary risks are quieter. They occur when language flows smoothly despite thin evidence — missing documentation, weak provenance, or incomplete sourcing.
These cases are hazardous precisely because entropy is low while justification remains insufficient.
Final mental model:
A language model does not ask “Am I correct?” It asks “Can language continue cleanly
from here?” Confidence, hesitation, lookup, and failure all emerge from that single
constraint.
References
- Brown, T. et al. “Language Models are Few-Shot Learners.” Advances in Neural Information Processing Systems, 2020.
- Radford, A. et al. “Improving Language Understanding by Generative Pre-Training.” OpenAI, 2018.
- Holtzman, A. et al. “The Curious Case of Neural Text Degeneration.” International Conference on Learning Representations, 2020.
- Guo, C. et al. “On Calibration of Modern Neural Networks.” International Conference on Machine Learning, 2017.
- Jiang, H. et al. “How Can We Know When Language Models Know?” Transactions of the Association for Computational Linguistics, 2021.
- Lewis, P. et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Advances in Neural Information Processing Systems, 2020.