Agent Teams and claude-swarm

This article is part of the Oracles, Traces, Triage series.

One agent hits a ceiling

A single Claude Code session can do one thing at a time. For small tasks—fix this function, write that test—that’s fine. But the work I care about is not small. Exploring multiple hypotheses in parallel, maintaining documentation while debugging, running specialized analysis while generating test harnesses.

One agent, one task, one context window. It doesn’t scale.

The agent-team pattern

In early February 2026, Anthropic published Building a C Compiler with Large Language Models—a detailed account of 16 Claude instances working in parallel to produce a 100,000-line Rust-based C compiler capable of building the Linux kernel. The total: nearly 2,000 Claude Code sessions, 2 billion input tokens, 140 million output tokens.

The architecture was surprisingly simple. No orchestrator. No message bus. No shared memory. Just git.

Each agent ran in a Docker container. Each cloned a shared bare repo, worked on a task, and pushed. When two agents tried to claim the same task, git’s built-in conflict resolution forced the second one to pick something else. Merge conflicts happened often; Claude was smart enough to resolve them.

The key insight: coordination through the codebase itself. The repo is the shared state. Commits are the messages. Locks are text files.

claude-swarm

This pattern can be implemented through claude-swarm—a reusable harness currently wired for running multiple Claude Code sessions in Docker containers, coordinating through git. The coordination pattern itself is tool-agnostic; claude-swarm is just one concrete implementation.

export ANTHROPIC_API_KEY="sk-ant-..."
export AGENT_PROMPT="path/to/prompt.md"
./tools/claude-swarm/launch.sh start
./tools/claude-swarm/launch.sh status
./tools/claude-swarm/launch.sh stop

The design is minimal by conviction, not by laziness:

Host                        /tmp (bare repos)
~/project/ ── git clone ──> project-upstream.git (rw)
               --bare       project-mirror-*.git (ro)
                                     |
                                     | docker volumes
                                     |
               .---------------------+---------------------.
               |                     |                     |
           Container 1          Container 2          Container 3
           /upstream  (rw)      /upstream  (rw)      /upstream  (rw)
           /mirrors/* (ro)      /mirrors/* (ro)      /mirrors/* (ro)
               |                     |                     |
               v                     v                     v
           /workspace/          /workspace/          /workspace/
           (agent-work)         (agent-work)         (agent-work)

All containers mount the same bare repo. When one agent pushes, others see the changes on the next fetch. Each container runs harness.sh, which clones, resets to origin/agent-work, runs one Claude session, and loops. Agents stop after a configurable number of idle sessions with no commits.

Why no orchestrator

The temptation is always to add a coordinator—something that assigns tasks, monitors progress, resolves conflicts. This approach avoids orchestration for the same reason it avoids workflow verbs in CLAUDE.md: centralized control tends to reduce agent autonomy and reasoning capabilities.

With no orchestrator, each agent must orient itself. It reads the README, checks the current state of the code, looks at what other agents have done, and decides what to work on next. This mirrors how good engineering teams actually function: shared context, local autonomy, coordination through artifacts.

Anthropic’s experience confirmed the pattern. Their agents maintained running docs of failed approaches. They took locks on tasks by writing text files. They specialized naturally—one agent coalescing duplicate code, another improving performance, another working on documentation.

Specialization is possible, not required

Right now, all agents in claude-swarm share the same prompt. They self-organize by looking at the repo and picking different things to work on.

Anthropic’s experience suggests that per-agent prompts—one focused on code quality, another on test coverage, another on documentation—can help at scale. claude-swarm supports that (just point AGENT_PROMPT at different files per container), but In practice, shared prompts often suffice for initial implementations. Agents typically self-organize effectively without specialized prompts.

This connects to the agent skills philosophy: the prompt shapes behavior. The harness just runs the loop.

When it works, when it doesn’t

Agent swarms work best when the problem decomposes into independent sub-tasks—many distinct failing tests, different modules, separate components. Each agent picks a different piece, and parallelism is trivial.

They struggle when the problem is monolithic. Anthropic hit this when compiling the Linux kernel: every agent would find the same bug, fix it independently, and overwrite each other’s changes. Their solution was to use GCC as an oracle and randomly split compilation between GCC and their compiler, letting each agent work on different failing file subsets.

For testing work, the decomposition is usually natural. Different invariants to test. Different modules to fuzz. Different state-machine paths to explore. The swarm pattern fits.

What this is really about

claude-swarm is about 200 lines of shell. It’s not the point.

The point is that the agent-team pattern—N autonomous agents, shared codebase, no central control—is a genuine paradigm for how AI-assisted work can scale. It’s not about making one agent smarter. It’s about making many agents productive together, the same way you’d make a team of engineers productive: clear context, local ownership, shared truth in the repo.

The C compiler was the proof of concept. Fuzz testing is where I’m applying it.

Next: Testing the Bugs Between Calls