Agents as Fuzzers

This article is part of the Oracles, Traces, Triage series.

The short version

A fuzzer is a search tool whose results must be triaged. An AI agent is a search tool whose results must be triaged. The parallel is not metaphorical. I think it’s structural.

Two search tools

A fuzzer explores the input space of a program, looking for inputs that violate some oracle—a crash, a hang, a property violation. When it finds something, you triage: real bug? Duplicate? Exploitable?

An AI agent explores the solution space of a problem, looking for code, fixes, or tests that satisfy some goal. When it produces something, you triage: correct? Complete? Does it address the problem?

Both search. Both produce results that need judgment. Both waste enormous time if pointed in the wrong direction.

The anatomy, side by side

Every fuzzer does four things:

Generates inputs (random, mutational, grammar-based, coverage-guided)
Executes the target with those inputs
Checks an oracle (crash? new coverage? property violation?)
Saves interesting results for triage

Every AI agent does the same four things:

Generates candidates (from prompt, codebase, agent skills)
Executes or applies them (writes code, runs tests, modifies files)
Checks an oracle (tests pass? linter clean? invariants hold?)
Saves results for triage (commits, PRs, logs)

Replace “inputs” with “candidates” and “crash” with “test failure.” The structure is identical.

What changes when the searcher understands context

Traditional fuzzers are context-blind. AFL doesn’t know what a function does. libFuzzer doesn’t understand the specification. They compensate with volume—millions of executions per second.

Context-blindness has costs:

Shallow oracles. “Did it crash?” works. “Does this violate the protocol invariant?” requires a custom harness—often harder to write than the code being tested.
Redundant exploration. Without understanding structure, the fuzzer wastes cycles in uninteresting regions of input space.
Triage burden. Many findings are duplicates, benign panics, or expected edge cases. You sort the signal from the noise.

An AI agent, by contrast:

Can read the specification
Can reason about which inputs trigger interesting behavior
Can write its own oracle and generate inputs designed to challenge it

The search becomes intentional without becoming rigid.

The convergence

Combine the pieces from this series:

Piece	Fuzzer equivalent	What it adds
Agent skills	Oracle	Richer than “did it crash?”—norms that agents translate into testable properties
Agent swarms	Multiple seeds	Parallel search where each instance can specialize, sharing findings via git
Stateful testing	Execution loop	For traces instead of single inputs

Together: context-aware search, parallel exploration, rich oracles.

Fuzzers still win at

Speed. Millions of executions/sec with a simple oracle (“did it crash?”). AFL and libFuzzer are unbeatable here.
Binary targets. No source code, no spec? Blind fuzzing is often the only option.
Deterministic reproduction. Fuzzers produce exact inputs. Agent traces may need work to become deterministic.
Corpus management. Mature fuzzers have corpus minimization, coverage tracking, seed scheduling. Agent ecosystems don’t—yet.

Agents win at

Rich invariants. “Does this sequence of state transitions preserve safety properties?” An agent can both formulate and check the invariant.
Spec-guided search. When the spec exists and is readable, agents generate targeted campaigns rather than relying on coverage alone.
Triage. An agent can produce a root-cause hypothesis before you ever see the failure. It can check for duplicates.
Harness generation. Writing fuzz harnesses is expert work. Agents can draft them from specs and iterate.

The spectrum

	Traditional Fuzzer	AI Agent
Input generation	Random / mutational / grammar	Context-aware / intentional
Oracle	Crash / coverage / property	Natural-language norm → property
Speed	Millions of executions/sec	Seconds to minutes per session
Context understanding	None	Deep
Triage	Manual	Agent-assisted
Parallelism	Independent seeds	Coordinated via git

The gap is narrowing. What matters is understanding which tool fits which problem—and being willing to combine them.

In practice

Traditional fuzzers for the fast, low-level search—serialization, encoding edge cases, roundtrip invariants. Simple oracles, enormous input spaces. Volume wins.
AI agents for the slow, high-level search—stateful invariants, cross-component interactions, spec compliance. Complex oracles, understanding required. Context wins.
Both together—agents generating hypotheses and fuzz harnesses, fuzzers executing at speed, agents triaging the results.

Fuzzing was barely known outside security research fifteen years ago. Standard practice after AFL and OSS-Fuzz. Table stakes today.

AI-assisted testing is on the same trajectory.