Async Agents Are a Game Changer (Part 1)

I once accidentally spawned 60 Claude agents at the same time.

My machine started grinding to a halt. Each agent was spinning up its own exploration subagents, which were spinning up their own subagents—exponentially multiplying like some kind of AI hydra. I needed to grep and kill every Claude process, but I couldn't remember the exact command. So I opened a new terminal tab and asked Claude for help.

Except Claude was now slow as hell because my entire system was choking on agent processes. I sat there watching my computer crawl, genuinely worried I was about to cook my machine and burn through my API credits in one catastrophic cascade.

I did eventually kill them all. And then I kept building.

Abstract visualization of exponential AI agent spawning showing chaotic interconnected nodes

Because despite the chaos, I'd stumbled onto something that fundamentally changed how I work: async agents unlock long autonomous orchestration. The ability to hand off complex, multi-step workflows and walk away for 15+ minutes while the system works—without drifting into chaos.

"Async agents enable sustained trust: hand off work and enter flow state elsewhere for 15+ minutes."

Flow state requires this sustained trust.

The Blocking Problem

Before async agents, I was doing batch parallelism. The workflow went like this:

Create a comprehensive plan with tasks organized into batches
Each batch contains independent tasks that can run in parallel
Instruct Claude to spawn agents for all tasks in batch 1 simultaneously
Wait for entire batch to complete
Move to batch 2, repeat

Claude spawns multiple agents in parallel as part of the same group of function calls—this is where the real compute advantage comes from. More agents running simultaneously means more work done faster.

But it's still fundamentally blocking.

You can't start batch 2 until batch 1 finishes, even if most of batch 1 is done and some batch 2 tasks only depend on one completed batch 1 task. The artificial batching creates bottlenecks. An API contract definition finishes, but the frontend and backend agents can't start until the database schema task completes—even though they don't depend on it.

And the bigger problem: constant interruptions. Investigation leads to planning leads to implementation—serial by necessity, since you need the investigation to inform the plan, and the plan before implementing. But even within implementation, you're constantly waiting. Finish the API contract, then do frontend, then do backend. Serial hell.

Well, mostly serial. Once you have the API contract defined, frontend and backend can happen simultaneously. But that requires coordination the batching system couldn't handle elegantly.

I start a new project every few days. Speed defines the workflow.

Enter klaude

klaude is a wrapper around the Claude CLI that acts identically to Claude but extends its behavior for subagent coordination. It's not a different model or a different interface—it's just Claude with more powerful orchestration capabilities.

The key features:

Async execution

Spawn a subagent and keep working while it runs in the background. The parent agent doesn't block.

This means:

Dependent tasks start immediately when dependencies complete—no waiting for artificial batches
Root agent can generate documentation while continuing validation
Root agent can trigger research agents while attempting bug fixes—taking a first stab at the problem while deeper exploration runs in parallel

Checkout/enter subagents

You can "enter" a subagent, switching the CLI conversation to that agent. When you're done, exit back to the parent.

This is critical for requirements gathering. The user needs to be part of the discussion to answer questions, but you don't want investigation notes polluting the main orchestrator's context. The parent agent spawns a requirements-gathering subagent, checks it out, the user participates in the conversation, then the agent writes a final requirements document and exits—returning clean, processed requirements to the orchestrator without the conversational cruft.

Non-Claude subagents

klaude can spawn GPT-5, Gemini 2.5 Pro, Grok, or other models as subagents. It's a bit hacky, but it works.

Having a GPT-5 subagent for reasoning-heavy tasks is genuinely useful. Spawning 30 Grok agents for simple repetitive fixes—like a dumber Haiku 4.5 running in parallel—gets the job done fast.

Subagents have their own MCPs

This one solved a huge problem. Medium-sized MCPs take up ~10k tokens each. Load Perplexity, Google Search, Reddit Search, and GitHub Search all at once—you're destroying your context window before you've written a single line of code.

"Lock heavy MCPs behind dedicated research subagents. Keep your main agent lean, spawn specialists when needed."

With klaude, I lock those MCPs behind a dedicated research subagent. The main workhorse agent stays lean, but the research agent has access to everything it needs when spawned. MCP bloat kills context windows. Delegating MCPs to specialized subagents fixes that.

Subagents can spawn subagents

There's a maximum depth and you can restrict which agents can spawn which types, but this enables recursive expansion of compute. If a research subagent uncovers a lot of information, it can spawn further subagents to explore different areas in parallel.

This is also where the 60-agent explosion happened. Use carefully.

The Payoff

The best example I have: 15,000 lines of backend code in 12 hours.

I had a very comprehensive plan. The code wasn't complex—repository/service/controller layers for an Express app—but it was a lot of surface area. I set up the orchestration workflow with async parallel agents and let it run.

It executed perfectly. Well, almost. The tests were completely wrong. Claude is terrible at TDD out of the box, and I'm not a big TDD person anyway—so I hadn't noticed how happy it was to write useless tests that validated mock behavior instead of real logic. I deleted them all.

But the implementation code? Flawless.

I could hand off the work and focus on something else for long stretches. Fifteen minutes of autonomous work without needing to check in. Enough time to enter flow state on a different problem.

Visualization of organized AI orchestration with flowing data streams and clear hierarchies

Constant 2-minute prompts where Claude asks for clarification or waits on dependencies destroy focus. Just enough time to get bored and context-switch, killing productivity.

"Long autonomous orchestration enables trust: sustained focus for 15+ minutes at a time."

The Chaos

I'm not going to pretend this is polished. The learning curve is high. This is the wild west of AI dev tools—everyone has their own system they think is "best," and I'm living and breathing this one daily.

The distinction between orchestrating agents and leaf agents is still sketchy. Orchestrating agents read workflows and spawn subagents. Leaf agents actually do the work—gathering requirements, planning, implementing. Recursive subagent orchestration can spiral out of control fast (see: 60 agents). Everything feels hacky.

The original version was held together with hooks, duct tape, and prayer. I had hooks that blocked on task tool calls and triggered separate agent processes in the background, with other hooks that alerted the main agent automatically. There was a script the main agent could run to await background agents. It worked—barely.

The current version is cleaner but still rough around the edges. This is under active development. I'm figuring it out as I go.

Why It Matters

Async agents enable workflows that couldn't exist before.

Requirements gathering with user interaction while keeping the orchestrator's context clean. Parallel frontend/backend implementation once the API contract is defined. Long autonomous work while you focus elsewhere. Documentation generated in the background. Research agents exploring multiple areas simultaneously while the main agent continues implementation.

These workflows require coordination that single-agent systems can't handle elegantly. They unlock productivity where you trust the agent to run for 15+ minutes and come back to real progress.

Speed matters for flow state. If I can work on something else for 30 minutes while an agent works—without drifting into chaos—I can focus on that other thing. If it's constant 2-minute prompts, I can never enter a flow state and concentrate. Just enough time to get bored and work on something else.

The game changer.

Async agents unlock long autonomous orchestration. That changes everything about AI-assisted development.