Writings

Confidence ≠ Competence

Suggestions on how to critically review LLM output


Researchers have long known that people conflate confidence with competence. When people sound confident, we naturally infer that they know what they’re speaking of. However, this isn’t true - which explains a lot about political and executive leadership.

So when we see LLM output - that is both impeccably formatted and delivered with bravado - we subconsciously think that it is likely to be right. It’s incredibly difficult to do but here are some suggestions to help see when things are less solid than they appear.

Watch for false specificity. LLMs generate precise-sounding numbers, dates, and citations that are often fabricated. The more specific a claim feels, the more worth verifying it is. Ask: is this detail load-bearing, or decoration?

Check the structure-to-substance ratio. Models produce beautifully organized responses — headers, numbered lists, logical flow — creating a halo effect where the form of competence substitutes for the fact of it. If it looks like a McKinsey slide but contains no non-obvious insight, that’s a red flag.

Probe the edges. Ask a follow-up that goes deeper on one specific claim. Confident-but-wrong outputs collapse under targeted questioning — the model will contradict itself or hedge suddenly. Genuine understanding holds up under interrogation; pattern-matched fluency doesn’t.

Beware of “balanced” framing as a substitute for accuracy. Models default to “on one hand / on the other hand” structures that can mask overwhelming evidence on one side. If a genuine consensus is presented as a “debate,” that even-handedness is misleading.

Look for the absence of “I don’t know.” Humans with real expertise regularly volunteer uncertainty. Models rarely do. If a genuinely difficult question gets a totally smooth, confident answer, that smoothness is the tell.

Distinguish retrieval from reasoning. Models are much stronger at reorganizing known information than at novel reasoning or applying principles to unusual cases. If the task requires genuine inference, scrutinize more carefully.

Cross-reference one claim. Pick the single most important or surprising claim and verify it independently. If it checks out, confidence in the rest rises. If it doesn’t, treat the whole response as suspect.