Agents as Fuzzers
This article is part of the Oracles, Traces, Triage series.
The short version
A fuzzer is a search tool whose results must be triaged. An AI agent is a search tool whose results must be triaged. The parallel is not metaphorical. I think it’s structural.
A fuzzer explores the input space of a program, looking for inputs that violate some oracle—a crash, a hang, a property violation. When it finds something, you triage: real bug? Duplicate? Exploitable?
An AI agent explores the solution space of a problem, looking for code, fixes, or tests that satisfy some goal. When it produces something, you triage: correct? Complete? Does it address the problem?
Both search. Both produce results that need judgment. Both waste enormous time if pointed in the wrong direction.
The anatomy, side by side
Every fuzzer does four things:
- Generates inputs (random, mutational, grammar-based, coverage-guided)
- Executes the target with those inputs
- Checks an oracle (crash? new coverage? property violation?)
- Saves interesting results for triage
Every AI agent does the same four things:
- Generates candidates (from prompt, codebase, agent skills)
- Executes or applies them (writes code, runs tests, modifies files)
- Checks an oracle (tests pass? linter clean? invariants hold?)
- Saves results for triage (commits, PRs, logs)
Replace “inputs” with “candidates” and “crash” with “test failure.” The structure is identical.
What changes when the searcher understands context
Traditional fuzzers are context-blind. AFL doesn’t know what a function does. libFuzzer doesn’t understand the specification. They compensate with volume—millions of executions per second.
Context-blindness has costs:
- Shallow oracles. “Did it crash?” works. “Does this violate the protocol invariant?” requires a custom harness—often harder to write than the code being tested.
- Redundant exploration. Without understanding structure, the fuzzer wastes cycles in uninteresting regions of input space.
- Triage burden. Many findings are duplicates, benign panics, or expected edge cases. You sort the signal from the noise.
An AI agent, by contrast:
- Can read the specification
- Can reason about which inputs trigger interesting behavior
- Can write its own oracle and generate inputs designed to challenge it
The search becomes intentional without becoming rigid.
The convergence
Combine the pieces from this series:
| Piece |
Fuzzer equivalent |
What it adds |
| Agent skills |
Oracle |
Richer than “did it crash?”—norms that agents translate into testable properties |
| Agent swarms |
Multiple seeds |
Parallel search where each instance can specialize, sharing findings via git |
| Stateful testing |
Execution loop |
For traces instead of single inputs |
Together: context-aware search, parallel exploration, rich oracles.
Fuzzers still win at
- Speed. Millions of executions/sec with a simple oracle (“did it crash?”). AFL and libFuzzer are unbeatable here.
- Binary targets. No source code, no spec? Blind fuzzing is often the only option.
- Deterministic reproduction. Fuzzers produce exact inputs. Agent traces may need work to become deterministic.
- Corpus management. Mature fuzzers have corpus minimization, coverage tracking, seed scheduling. Agent ecosystems don’t—yet.
Agents win at
- Rich invariants. “Does this sequence of state transitions preserve safety properties?” An agent can both formulate and check the invariant.
- Spec-guided search. When the spec exists and is readable, agents generate targeted campaigns rather than relying on coverage alone.
- Triage. An agent can produce a root-cause hypothesis before you ever see the failure. It can check for duplicates.
- Harness generation. Writing fuzz harnesses is expert work. Agents can draft them from specs and iterate.
The spectrum
| |
Traditional Fuzzer |
AI Agent |
| Input generation |
Random / mutational / grammar |
Context-aware / intentional |
| Oracle |
Crash / coverage / property |
Natural-language norm → property |
| Speed |
Millions of executions/sec |
Seconds to minutes per session |
| Context understanding |
None |
Deep |
| Triage |
Manual |
Agent-assisted |
| Parallelism |
Independent seeds |
Coordinated via git |
The gap is narrowing. What matters is understanding which tool fits which problem—and being willing to combine them.
In practice
- Traditional fuzzers for the fast, low-level search—serialization, encoding edge cases, roundtrip invariants. Simple oracles, enormous input spaces. Volume wins.
- AI agents for the slow, high-level search—stateful invariants, cross-component interactions, spec compliance. Complex oracles, understanding required. Context wins.
- Both together—agents generating hypotheses and fuzz harnesses, fuzzers executing at speed, agents triaging the results.
Fuzzing was barely known outside security research fifteen years ago. Standard practice after AFL and OSS-Fuzz. Table stakes today.
AI-assisted testing is on the same trajectory.
Related posts