Oracles, Traces, Triage

In 2026, “agentic” tooling is moving fast enough that yesterday’s workflow advice goes stale quickly. This series is my attempt to write down the parts that seem durable: how to give agents norms instead of scripts, how to coordinate multiple agents through the repo, and how those patterns connect back to fuzzing and stateful testing.

Why

Skepticism is healthy here. Agent outputs still need oracles, reproducibility, and a bar for correctness. The tools are changing quickly; the craft is not.

Agents can draft tests fast; the hard part is still choosing the right oracles and insisting on reproducible failures.

Examples use Claude Code because that’s what I run day-to-day, but the patterns are meant to travel to any agent that can read a codebase, run checks, and write down findings.

This is not a tutorial. It’s a practitioner’s notebook.

The lens: oracles, traces, triage

If there’s a unifying theme here, it’s that most bug-finding systems succeed or fail on three things:

Oracles — how you decide something is wrong. Not just “did it crash?”, but invariants, spec checks, and properties that reflect what you actually care about.
Traces — many expensive bugs live in sequences, not calls. Stateful testing is about generating and shrinking traces until the failure is undeniable.
Triage — search produces noise. The work is making findings reproducible, minimal, deduped, and actionable (ideally as regression tests).

That’s also a useful way to read the series:

Part 1 (skills/norms) mostly expands the oracle surface (“what matters”).
Part 2 (agent teams) scales the search and improves the artifacts that enable triage.
Part 3 (between calls/stateful) is explicitly about traces.
Part 4 (agents as fuzzers) argues agents are search tools too—and the hard parts are still oracles + triage.

Articles

Part 1 - Agent Skills and claude-lint
How to structure .claude/ context so agents adapt from norms instead of blindly following scripts (plus a linter to keep it from drifting).
Part 2 - Agent Teams and claude-swarm
A practical pattern for parallel agents that coordinate through git, with no orchestrator—because the repo can be the shared state.
Part 3 - Testing the Bugs Between Calls
How agent skills, agent swarms, and stateful testing could combine to find consensus bugs that live in traces.
Part 4 - Agents as Fuzzers
A structural analogy: both fuzzers and AI agents search for failures that require triage and oracles.

Companion post

If you want the deeper motivation for “why traces,” start here:

The Bugs Between Calls

Part 5 — Self-hosted agents on Runpod (and friends)
Turning inference into a reliable test service: latency/cost knobs, guardrails, artifacts, and how to run agent loops against real repos.
Part 6 — Quantization as a feature: cheap tests when deep reasoning isn’t needed
Using smaller/quantized models for throughput work (scaffolding, formatting, test expansion) and reserving big models for judgment-heavy steps.
Part 7 — Corpus, shrink, triage: turning agent output into a fuzzing pipeline
How to dedupe/minimize failures and turn “agent finds” into reproducible bug packets and long-lived regression corpora.

Background

Most of what I know about testing came from shipping production systems and learning in public through open source: contributing to AutoFixture starting around 2011, then maintaining Hedgehog, which once powered Echidna, an early and widely used property-based fuzzer for Ethereum smart contracts.

Along the way: Fare for regex-constrained test generation, a SplitMix port for reproducible failure discovery. Consensus fuzzers at Stacks that caught a production bug a 533-line integration test couldn’t reproduce.

That background is why I’m interested in AI tooling—not as a replacement for any of this, but as a way to do more of it.

Feedback

The ideas in this series come from daily practice—shipping agent-assisted testing tools for real protocol security work. But daily practice has blind spots.

If you think I’m wrong about something, I’d like to hear it. If you think I’m right but missing a nuance, I’d especially like to hear that.

Next: Agent Skills and claude-lint