Agent Skills and claude-lint

This article is part of the Oracles, Traces, Triage series.

The temptation

The first thing most people do with a CLAUDE.md file is write a recipe. Step 1, do this. Step 2, do that. If you see an error, run this command. Here’s a code block you can paste.

It works. For about a week. Then the codebase shifts, the recipe goes stale, and the model follows outdated instructions with the confidence of someone who doesn’t know they’re wrong.

I’ve seen this pattern before. It’s the same failure mode as over-specified test fixtures: the more you hard-code the steps, the more brittle the system becomes. The test passes for the wrong reasons. The agent succeeds for the wrong reasons.

Context should shape reasoning, not script behavior

This distinction proves crucial in practice. When .claude/ directories emphasize workflows over norms, models tend to follow outdated instructions rigidly. When structured around principles and facts, models demonstrate greater adaptability to changing contexts.

Whether this constitutes “reasoning from principles” in a deep sense remains an open question. However, the resulting outputs consistently demonstrate improved quality and relevance.

Think about it from a testing perspective. A unit test that asserts f(3) == 7 checks one input. A property that asserts for all x: f(f_inverse(x)) == x checks the relationship.

Change how f computes internally and the property still holds—it only cares that the roundtrip works. The hard-coded assertion breaks the moment the mapping shifts.

Same idea. A CLAUDE.md that says “run cargo test after every change” is a hard-coded assertion. A CLAUDE.md that says “all changes must pass the existing test suite” is a property. The model can figure out how to run the tests. What it needs from you is what matters.

The layers

Over time, I’ve settled on a layered structure for .claude/ directories:

Layer	What belongs	What doesn’t
`CLAUDE.md`	Norms, facts, project conventions	Workflow verbs (“step 1”, “then do”), code blocks
`agents/*.md`	Perspective, values (≤120 lines)	Procedures, code blocks
`skills/*/SKILL.md`	Capabilities (≤500 lines)	Success criteria, code blocks
`references/*.md`	Playbooks, optional reference material	Missing “optional” declaration

CLAUDE.md is the constitution. Short. Declarative. “This project uses Rust.” “Tests must pass before commits.” “Prefer explicit error handling over unwrap.” No instructions on how to do things—just what matters.

Agents get a perspective. If you have a code-quality agent, it gets values like “favor readability over cleverness” and “flag any function longer than 40 lines.” It doesn’t get a checklist.

Skills describe capabilities the model can use—not step-by-step procedures. A skill for “running fuzzers” says what the fuzzer does, what inputs it expects, what success looks like at a high level. It does not contain a bash script.

References are the escape hatch. Sometimes you genuinely need a playbook—a deployment procedure, a migration guide. References hold those, but they must declare themselves as optional. The model should know these are reference material, not marching orders.

claude-lint

A Rust CLI tool called claude-lint helps enforce these patterns by checking .claude/ directories for violations.

$ claude-lint .claude
ok: .claude passes all checks

$ claude-lint /path/to/.claude
error: /path/to/.claude/CLAUDE.md: contains workflow verb 'step 1'
error: /path/to/.claude/skills/foo/SKILL.md: contains fenced code block
2 error(s)

It checks for:

Workflow verbs in CLAUDE.md (e.g., “step 1”, “then run”, “next, do”)
Code blocks where they don’t belong (everywhere except references)
Line limits on agents (≤120) and skills (≤500)
Missing “optional” declarations in reference files

It’s deliberately strict. The point is not to make .claude/ directories pleasant to read. The point is to keep them in the shape where I’ve seen the model produce the best results.

Why this matters in practice

Claude Code demonstrates this approach in practice, using structured context to explore edge cases, generate test harnesses, and reason about state-machine invariants. The quality of outputs correlates directly with the quality of provided context.

When I embed workflows, the model sticks to them—even when they’re wrong for the current situation. When I embed norms (“never skip precondition checks”, “all state transitions must be tested for idempotence”), I get output that adapts to whatever the model finds in the codebase.

Whether that’s “reasoning from norms” or just the model having more room to draw on its training, I can’t say for certain. What I can say is that the parallel to property-based testing feels right. Properties tell the system what must hold. The system figures out how to check it. Norms tell the model what matters. The model figures out how to act on it.

Same shape. I’ll take it.

Oracles, Traces, Triage (series index)
The Bugs Between Calls