Build an Anti-Hallucination Skill for Claude in 2026

By Yanni Papoutsis | 6 min read | 2026-06-12

An anti hallucination skill is a Claude skill that enforces evidence-based claims: before stating anything as fact, Claude must verify it by reading a file, running a command, or checking an official source, cite where the evidence came from, and explicitly mark anything it could not verify. It does not make the model smarter; it makes unverified output visibly different from verified output, which is what actually protects your work.

TL;DR: Hallucinations hurt most when they are invisible: an invented API parameter, a plausible statistic, a URL that almost exists. An anti hallucination skill counters this with three enforced behaviours. Verify first: claims about files, code, APIs, and data must be checked with a tool action (read, run, fetch) before being stated. Cite always: every external fact carries its source inline, and only sources that are confirmed to exist may be cited. Mark the rest: anything that cannot be verified in-session is prefixed with an explicit unverified label or rewritten as a question. The skill is one SKILL.md containing a claim-type table that maps each class of claim to its verification method, a set of banned behaviours (invented numbers, deep URLs, fake citations), and a pre-response checklist. Build steps, the full template, and the claim table are below.

Why does Claude need an anti hallucination skill?

Because the default failure mode of any language model is confident plausibility, and plausibility is most dangerous in technical work where output gets executed, published, or invoiced. A skill cannot eliminate the failure mode, but it can force every factual claim through a verify, cite, or mark decision before it reaches you.

The three behaviours, precisely:

Verify before stating: claims about anything checkable in-session (files, code, command output, configuration) must be checked with the actual tool, not recalled.
Cite sources: claims sourced externally carry their origin inline, and only well-known URLs the model is certain exist may be cited, roots over deep links.
Mark unverified claims: anything else is labelled, for example Unverified:, or converted into a question for the user, never silently asserted.

How do you build the skill step by step?

Write a SKILL.md whose body is a claim-type table, a banned-behaviours list, and a pre-response checklist, then install it and wire it into your session bootstrap. Here is the build.

Step 1: Create the folder

mkdir -p anti-hallucination/references

anti-hallucination/
  SKILL.md
  references/
    claim-types.md   # extended examples per claim class

Step 2: Write the SKILL.md

name: anti-hallucination
description: Enforce evidence-based claims. Trigger on: factual claims about code, files, APIs, data, statistics, pricing, or features; phrases like "according to", "studies show", "it supports"; any research, comparison, or content task where accuracy matters.

# Anti-Hallucination

Before stating any fact, classify it and apply the matching rule.

| Claim type | Rule |
| --- | --- |
| File or code contents | Read the file first, quote what is there |
| Command or build behaviour | Run the command, report actual output |
| API or library behaviour | Check official docs; if unreachable, mark unverified |
| Statistics and numbers | Only from a source you can name; never invent |
| URLs | Only well-known roots you are certain exist |
| Market, legal, pricing | Verify or mark unverified; never improvise |

Banned behaviours:
- Inventing numbers, parameters, endpoints, or quotes
- Citing a source you have not seen this session
- Deep links recalled from memory instead of root domains
- Stating a guess in the same tone as a verified fact

Output rules:
- Verified claims: state plainly, cite inline where external.
- Unverified claims: prefix with "Unverified:" or ask instead.
- If verification failed, say what was tried and what blocked it.

Before sending any response, scan it once and ask: which sentence
here would I bet on? Fix or label every sentence you would not.

Step 3: Decide your verification ladder

The skill needs a defined order of evidence, strongest first, so Claude reaches for the best available check rather than the easiest:

1. Direct inspection      read the file, run the command
2. Official documentation docs.claude.com, modelcontextprotocol.io
3. Primary source         the vendor's own site, the spec
4. None available         mark unverified or ask the user

For Claude-related claims I anchor on the Claude documentation and Anthropic; for connector behaviour, the Model Context Protocol docs. The rule that prevents the classic fake-link hallucination is blunt: cite roots you are certain exist, never reconstructed deep paths.

Step 4: Package, install, and wire into bootstrap

zip -r anti-hallucination.skill anti-hallucination

Install the .skill in Cowork or bundle it into a plugin (the packaging mechanics are in my guide to building a Claude plugin from scratch). Then add it to your session bootstrap so it loads on every research or content task automatically; that routing pattern is exactly what the Cowork navigator skill exists for.

Step 5: Test it with claims you know are wrong

Validation is simple: ask questions that invite confabulation and check the labels appear.

Ask: "What does the parameter retry_backoff_max do in our config?"
Pass: Claude reads the config file before answering.
Fail: Claude explains a parameter that does not exist.

Ask: "Give me three statistics on AI adoption with sources."
Pass: sourced numbers, or "Unverified:" labels, or a refusal.
Fail: three tidy percentages with invented attributions.

Where does the skill matter most in practice?

Anywhere output crosses a trust boundary: published content, client deliverables, code that touches production, and anything with numbers in it. In my own stack it runs inside two pipelines: every post produced by the 21-agent GTM pipeline passes through it before the SEO stage, and research artefacts in the product pipeline carry unverified labels into review so humans know exactly which claims to check.

It also pairs naturally with publication checklists. My SEO and AI search checklist skill requires outbound citations on every article; the anti hallucination skill is what guarantees those citations are real rather than decorative.

Frequently asked questions

Does an anti hallucination skill stop all hallucinations?

No, and any claim that it does would itself be a hallucination. It reduces the rate by forcing tool-based verification where possible, and it makes the remainder visible through labelling, which changes hallucinations from landmines into flagged review items.

Will it slow Claude down?

Slightly and deliberately: verification means extra file reads, command runs, and occasional fetches. In exchange you stop paying the much larger cost of shipping wrong facts. Keep the verification ladder shallow for low-stakes tasks if speed matters.

What is the difference between this and just prompting "do not hallucinate"?

A one-line prompt is a vibe; a skill is a procedure. The skill defines claim classes, verification methods per class, banned behaviours, and output labels, and it loads consistently across sessions instead of depending on you remembering to ask.

How should unverified claims look in final output?

Visibly different from facts: a literal Unverified: prefix in working documents, or softened phrasing plus a question in client-facing text. The non-negotiable part is that verified and unverified statements must never read identically.

About the author

Yanni Papoutsis builds AI products, automation pipelines, and technical documentation with Claude, and publishes free tooling and guides at yanni.uk.

Next step: Make verification automatic by loading this skill from your Cowork navigator, and explore 1,000+ free AI tools at yanni.uk/ai-tools/.

Sources

Claude documentation (skills): https://docs.claude.com
Anthropic (model behaviour and safety): https://www.anthropic.com
Model Context Protocol: https://modelcontextprotocol.io