Build a Token Budget Skill to Cut Claude Costs in 2026
TL;DR: Most token waste is structural, not conversational: a 400-line build log pasted in full, a 2,000-line file read to answer a question about one function, five skills loaded when one was needed, and verbose replies restating what both sides already know. A Claude token budget skill encodes the counter-rules. Compression: answers lead with the result, use bullets over paragraphs, and never restate the question. Truncation: command output is capped (head and tail, grep before cat), and file reads use offsets and limits instead of whole-file loads. Loading discipline: skills keep bodies short and push detail to references loaded only when needed. Targets: the session has a stated budget, long outputs go to files instead of chat, and big tasks start fresh sessions rather than inheriting bloated context. The SKILL.md template with caps, the truncation patterns, and the session targets are below.
A Claude token budget skill is a skill that enforces spending rules on the context window: compressed responses, truncated tool output, capped file reads, and on-demand skill loading, with explicit context targets per session. Tokens are the unit you actually pay Claude in, money and capacity both, and a budget skill routinely cuts consumption by a third or more without losing output quality.
Where do Claude sessions actually waste tokens?
Four places, in descending order of damage: untruncated tool output, oversized file reads, indiscriminate skill loading, and verbose prose. Everything in a context window costs on every subsequent turn, so early waste compounds for the whole session.
| Waste source | Typical cost | Budget rule |
|---|---|---|
| Untruncated command output | hundreds of lines per call | cap at 50 lines, filter first |
| Whole-file reads | thousands of lines per read | offset and limit, 200-line cap |
| Eager skill loading | every skill body, every session | load on demand, references later |
| Verbose responses | steady drain every turn | result first, bullets, no echo |
How do you build the token budget skill?
Write a SKILL.md that states hard caps, the truncation patterns, the loading rules, and the session targets, then load it from your bootstrap so it governs every session. Here is the build.
Step 1: Create the folder
mkdir -p token-budget
Step 2: Write the SKILL.md with hard caps
---
name: token-budget
description: Reduce token consumption. Trigger on: session start, any tool output over 50 lines, any file read over 200 lines, repeated reads of the same file, or mentions of tokens, context, budget, compress, or expensive session.
---
# Token Budget
## Output compression
- Lead with the result. Explanation only if asked.
- Bullets over paragraphs; no preamble, no restating the task.
- Never paste content back that the user just provided.
## Tool output truncation
- Cap displayed command output at 50 lines.
- Filter before displaying: grep, head, tail, wc first.
- Summarise logs to failures plus counts, not full dumps.
## File read caps
- Default read: 200 lines with offset, never whole files.
- Locate before reading: search for the symbol, read around it.
- Never re-read a file already in context unless it changed.
## Skill loading discipline
- Load skills on demand, not speculatively.
- Skill bodies stay under 100 lines; detail lives in references/.
- Load references only at the moment they are needed.
## Session targets
- State a budget at session start for long tasks.
- Big outputs go to files; chat gets the path plus a summary.
- New major task = new session, not a bloated continuation.
Step 3: Encode the truncation patterns
Give Claude the exact command shapes so the cheap path is the default path:
# Instead of: cat build.log
tail -n 30 build.log
# Instead of: showing a whole test run
npm test 2>&1 | tail -n 20
# Instead of: reading a whole file to find one function
grep -n "applyDiscount" src/pricing.ts
# then read 40 lines around the match, not the file
The same logic applies to every tool, not just shell: list a directory before reading files from it, and fetch one page section rather than a whole document when the question is narrow.
Step 4: Set session context targets
Caps control single actions; targets control trajectories. Three targets work well in practice: start heavy tasks below half of the context window, checkpoint progress to a file the moment a long session passes the halfway mark, and treat "summarise state, write it to disk, start fresh" as a normal move rather than a failure. Writing state to disk works best when filenames are predictable, which is exactly what the file organization skill guarantees.
Step 5: Package, install, and wire into bootstrap
zip -r token-budget.skill token-budget
Install it standalone or inside a plugin (see the pillar on building a Claude plugin from scratch), then have your Cowork navigator skill load it at every session start so the caps exist before the first expensive tool call. Anthropic's guidance on context management for agents at anthropic.com covers the underlying principle: context is a finite resource and agents perform best when it is curated.
How does loading discipline interact with other skills?
Loading discipline is the multiplier: it decides what your other skills cost. The pattern that keeps a multi-skill setup affordable is short bodies, deep references: every skill carries a compact always-loaded core, and its heavyweight material sits in references/ files that load only on demand. The skill-authoring guidance in the Claude documentation describes this progressive disclosure model.
The effect is biggest on large systems. My 21-agent product pipeline holds 21 skills, yet a session running stage 07 loads one stage body plus the shared contract, not the other 20, and artefacts pass between stages as files on disk rather than as chat history. That is the difference between a pipeline that fits in budget and one that drowns in its own context.
Frequently asked questions
How many tokens does a budget skill actually save?
It depends on your worst habits, which is the point: the skill targets the structural waste. Sessions heavy on builds, logs, and large files see the biggest gains because truncation and read caps bite hardest there; the savings show up directly in plan capacity and API spend.
Does compressing output make Claude's answers worse?
No, compression rules target redundancy, not substance: preambles, restated questions, and full dumps where a summary plus a file path carries the same information. Anything cut by mistake is one follow-up question away.
Should the caps ever be broken?
Yes, deliberately. Debugging sometimes needs a full stack trace, and a final review sometimes needs a whole file. The skill's rule is that exceeding a cap must be a stated decision with a reason, never a default.
Is it better to start a new session than continue a long one?
Once context is bloated, usually yes: checkpoint state to a file, start clean, and load the checkpoint. You lose nothing that was written down and shed everything that was dead weight, and the new session runs faster and cheaper.
About the author
Yanni Papoutsis builds AI products, automation pipelines, and technical documentation with Claude, and publishes free tooling and guides at yanni.uk.
Next step: Make the caps automatic by loading this skill from your Cowork navigator, and explore 1,000+ free AI tools at yanni.uk/category/all/.
Sources
- Claude documentation (skills and context): https://docs.claude.com
- Anthropic (context management for agents): https://www.anthropic.com
- Model Context Protocol: https://modelcontextprotocol.io