← back to blog

Sub-agent delegation patterns that actually work

Sub-agents aren't a magic concurrency trick. Used right, they parallelize research and protect context. Used wrong, they're expensive token thrashing. Here's how to tell the difference.

agentsclaude-agent-sdksub-agentsai-architecture

Sub-agents — spawning isolated agent invocations from a parent — became the hot pattern in 2025 and by now everyone has one. Most aren't using them for the right reasons.

The question isn't "should I add sub-agents?" It's "does my work actually have properties that make sub-agents pay off?" Three properties matter. If you don't have them, sub-agents cost more than they save.

The three properties

Sub-agents earn their keep when your work has:

  1. Independent sub-tasks — each sub-agent's work doesn't depend on another's output
  2. Context-heavy sub-tasks — the raw material for each sub-task would bloat the parent's context window if kept around
  3. Specialist-suitable sub-tasks — a narrower prompt with narrower tools produces a better result than a general prompt

If you have two of the three, sub-agents probably help. If you have all three, they're the right call. One or zero — do the work in the parent.

Pattern 1: Parallel research

The classic win. The parent issues several Task calls at once; each sub-agent researches one topic independently; the parent receives compressed summaries.

Parent: Compare Pinecone, Weaviate, and Qdrant for our workload.
[Task] Research Pinecone pricing, scale, clustering. 200 words.
[Task] Research Weaviate pricing, scale, clustering. 200 words.
[Task] Research Qdrant pricing, scale, clustering. 200 words.

Three parallel sub-agents, each burning through their own web fetches and documentation reads. Parent sees 600 words of synthesized output, not 60K tokens of raw pages. Context stays clean for the final comparison.

Rule: require summarized output with a word cap. A sub-agent that returns "the full details" defeats the point.

Pattern 2: Specialist personas

Parent orchestrates across specialists, each with a narrower tool set and a tighter system prompt:

  • researcher sub-agent: WebFetch, Read — gathers info
  • reviewer sub-agent: Read, Grep — code review
  • writer sub-agent: Read, Write — drafting

The parent plans. Specialists execute. This works because specialized prompts + specialized tools outperform a generalist agent with 30 tools on complex tasks.

When it doesn't work: when the sub-tasks need the parent's full context. Passing 50K tokens of prior conversation to a sub-agent defeats the isolation.

Pattern 3: Context protection

You have a long-running parent agent doing careful reasoning. You need to do a one-off tool-heavy task that would dump 30K tokens of output into the context window — logs, file dumps, API listings. Delegate it:

[Task] Check /var/log/app/ for errors in the last 24h.
       Return only the top 5 unique error patterns with counts.

Sub-agent pulls the logs, processes them, returns a clean summary. Parent's context stays clean for the reasoning it's doing.

This is the pattern most teams miss. Sub-agents as context firewalls, not just as parallelism.

When sub-agents make things worse

Three anti-patterns we see constantly:

"Delegating trivial work"

The parent spawns a sub-agent for a single tool call. The sub-agent spin-up (model cold start, tool binding, first token latency) costs more than just doing the work. You paid 2× and waited longer.

"Sub-agent that needs parent's full context"

The sub-agent prompt includes 20K tokens of history "so it knows what's going on". You've now paid for that context twice and lost the isolation benefit. The work belongs in the parent.

"Unbounded parallel spawn"

Parent decides to delegate to 20 sub-agents at once. 20× rate limit consumption; 20 things to log; 20 things that can fail. Cap parallel spawn count — 5 at a time is plenty for most workflows.

Prompt sub-agents like a smart colleague

Sub-agents start with nothing. No parent history, no prior context. Prompt them the way you'd brief a contractor who just walked in:

  • Goal: "Research X for purpose Y"
  • Constraints: "Use only public sources; don't cover marketing fluff"
  • Output shape: "Return ≤200 words with bullet points per metric"
  • Success criteria: "Focus on pricing at 10M-scale, not free-tier limits"

Vague prompts produce vague results. The sub-agent has no context to infer what you meant.

The verification step

Sub-agents hallucinate, loop, and misunderstand like any agent. Parent must verify critical sub-agent output before acting on it. Two patterns:

  1. Cross-check: parent prompts a second sub-agent to verify the first's conclusion
  2. Constraint check: parent runs a deterministic check on the output (schema, length, citation presence)

For research, one sub-agent returning a plausible-sounding wrong answer can route a parent into bad decisions for the rest of the loop. Verification catches it.

Cost math

Sub-agents cost compute. They save context. Whether that tradeoff wins depends on:

  • Parent context size (bigger = sub-agents win more — every extra parent token costs per turn)
  • Sub-agent task scope (narrow = cheap; broad = not worth it)
  • Reuse (is the sub-agent work cached? Can it be?)

Rough heuristic: if the sub-agent's raw input would exceed 5K tokens in the parent's context, delegation probably wins. Under that, doing it in the parent is usually cheaper.

Observability

Every delegation should be logged: the prompt, the sub-agent tools, the result, duration. When things go wrong — agent goes off the rails, spend explodes, output is garbage — the delegation log is how you debug.

In the Claude Agent SDK, a PostToolUse hook on the Task tool gives you a single hook point to capture every delegation. Drop it into your audit pipeline.

What to build

If you don't have sub-agents yet, the first pattern to build is parallel research with summarized output. High signal, bounded risk, works in one session. Add the specialist personas next as you find sub-tasks that genuinely benefit from narrower prompts.

Context protection (pattern 3) comes last because you need to notice the context-bloat problem first. Once you see a parent agent's context balloon from a single tool call, it becomes obvious — and then you delegate.

The full patterns, with prompt templates and error handling, are in our sub-agent delegation skill.

════════════════════════════════════════════════════════════════
latestaiagents | MIT License

Made with </> for the AI agent community