Why does claude -p subprocess drop intermediate turns in a multi-agent system?

claude -p returns only the final text of the last agent turn. Every intermediate turn — tool calls, reasoning text, in-progress output — is discarded. Use claude-agent-sdk's async query() generator instead to receive every turn as it happens.

How do I stream real-time output from parallel Claude agents in Python?

Use claude-agent-sdk's query() async generator with asyncio.gather for parallel execution. Each AssistantMessage yields one turn's text blocks, so your orchestrator receives intermediate output as tokens arrive — not just the final response.

What does setting_sources=['project', 'local'] do in claude-agent-sdk?

It prevents sub-agents from loading your global ~/.claude/CLAUDE.md and all MCP server tool schemas. Without it, each agent call loads thousands of tokens of tool schemas the sub-agent never uses — at 20 calls/hour that's 60K tokens of context noise.

How do I fix a Claude sub-agent that ignores a new system prompt?

Delete the session registry file. A stale session_id carries its old context, including any output format it was trained on in that session. Start fresh: delete the registry and let the agent create a new session on the next run.

[Claude Code] claude-agent-sdk vs subprocess: Why Intermediate Turns Disappear

TL;DR

claude -p subprocess drops all intermediate turns — tool calls, reasoning, in-progress output — returning only the final text. Migrating to claude-agent-sdk with query() async generator gives real-time streaming of every turn, plus session resume and parallel execution.

Plain-Language Version: Why Your Multi-Agent System Only Shows the Final Answer

Imagine asking three assistants to research different topics simultaneously, but you can only see what each one writes in their final report — not their working notes, not their intermediate findings, nothing until they are done. That is what happens when you run AI agents as subprocesses: you get the finished output, but miss everything in between.

The claude-agent-sdk fixes this by providing a streaming interface. Instead of waiting for the subprocess to finish, your orchestrator receives every intermediate step as it happens — tool calls, reasoning, partial outputs — in real time. This article covers the migration from subprocess to SDK, the setting_sources trap that loads thousands of unnecessary tokens into each sub-agent, and how to handle stale sessions that ignore new instructions.

Preface

Running a subprocess is like sending a letter with a return envelope. You know what you sent, and eventually you get one reply. What you don't see is everything that happened in between — the thinking, the tool calls, the intermediate drafts. For a multi-agent system that needs to display real-time output from parallel agents, the letter-and-reply model isn't enough.

This is about migrating from claude -p subprocess to claude-agent-sdk, what changes, and what the subprocess version silently drops.

What's Wrong with `claude -p`?

The initial orchestrator used subprocess.Popen with claude -p:

result = subprocess.run(
    ["claude", "-p", task, "--output-format", "text"],
    capture_output=True, text=True
)
output = result.stdout

This works for single-turn, fire-and-forget calls. It breaks for anything else.

What claude -p returns: Only the final text message of the last agent turn. If an agent makes three tool calls, emits reasoning text between them, and finally outputs a summary — the subprocess returns only the summary. Every intermediate turn is discarded.

For a MUD-style terminal display where multiple agents run in parallel and stream their output in real-time, this means the user sees nothing until each agent finishes — then all output arrives at once. The UX is identical to no streaming at all.

--output-format stream-json --verbose exists, but parsing the raw event stream is fragile, session resumption is manual, and it's still a subprocess with all the same lifecycle limitations.

How Does the SDK Approach Fix This?

claude-agent-sdk (Python) provides query(), an async generator that yields messages per turn — including every intermediate turn between tool calls.

from claude_agent_sdk import ClaudeAgentOptions, query
from claude_agent_sdk.types import AssistantMessage, ResultMessage, SystemMessage, TextBlock

async def run_agent(name: str, task: str, session_id: str | None = None):
    options = ClaudeAgentOptions(
        system_prompt=AGENT_CONFIGS[name]["system"],
        disallowed_tools=["Task", "ExitPlanMode", "AskUserQuestion"],
        permission_mode="bypassPermissions",
        cwd=str(BASE_DIR),
        setting_sources=["project", "local"],
        **({"resume": session_id} if session_id else {}),
    )

    captured_session_id = session_id
    async for message in query(prompt=task, options=options):
        if isinstance(message, SystemMessage) and message.subtype == "init":
            captured_session_id = message.data.get("session_id")
        elif isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock) and block.text.strip():
                    yield block.text, captured_session_id
        elif isinstance(message, ResultMessage):
            captured_session_id = message.session_id or captured_session_id

    return captured_session_id

Each AssistantMessage corresponds to one agent turn. Text emitted between tool calls appears as its own TextBlock. The orchestrator gets every intermediate output as it happens.

Session Resume

The SDK supports session resumption: pass a previous session_id and the agent picks up where it left off, with its full context intact.

# First run — no session
session_id = await run_agent("nanase", "check system health")

# Store session_id
registry[name] = session_id
save_registry(registry)

# Follow-up — resume session
session_id = await run_agent("nanase", "what was the root cause?", session_id=registry.get(name))

Session IDs come from ResultMessage.session_id (preferred) or SystemMessage.data["session_id"] as fallback. Store them in a registry keyed by agent name.

One caveat: if the agent's system prompt or output format changes, delete the registry. Old sessions carry stale context that conflicts with the new instructions. A stale session that was trained on NDJSON output will keep outputting NDJSON even if the new system prompt no longer requests it.

How Do You Run Multiple Agents in Parallel?

Multiple agents run concurrently with asyncio.gather:

assignments = [
    ("nanase", "check docker containers"),
    ("futaba", "review the diff in /tmp/patch.py"),
    ("shion", "what positions should I open today"),
]

results = await asyncio.gather(*[
    run_agent(name, task, registry.get(name))
    for name, task in assignments
])

Each agent runs independently, yields output as it produces it. The orchestrator multiplexes output to the terminal (or to separate display panels).

For interactive prompts — when an agent needs input before continuing — serialize with a lock:

_input_lock = asyncio.Lock()

async def prompt_user(agent_name: str, question: str) -> str:
    async with _input_lock:
        return await asyncio.to_thread(input, f"\n{agent_name} ❓ > ")

The lock prevents two agents from prompting simultaneously. asyncio.to_thread keeps the event loop unblocked while waiting for user input.

Disallowed Tools

Always set:

disallowed_tools=["Task", "ExitPlanMode", "AskUserQuestion"]

Task — prevents sub-agents from spawning further sub-agents (breaks the orchestrator's control model)
ExitPlanMode — prevents agents from entering plan mode, which blocks indefinitely
AskUserQuestion — prevents agents from using the built-in question UI, which bypasses the orchestrator's input handling

Without these, a sub-agent can spawn its own sub-agents, stall in plan mode, or pop a question dialog that the orchestrator has no way to respond to.

setting_sources

setting_sources=["project", "local"]

Without this, sub-agents inherit ~/.claude/CLAUDE.md and all MCP server tool schemas from the user's global settings. For a system with many MCP servers, this means each agent turn loads thousands of tokens of tool schemas that the sub-agent will never use. setting_sources=["project", "local"] limits loading to project-level settings only, skipping global MCP schemas.

The savings are significant for agents that run frequently. An orchestrator making 20 agent calls per hour with 3K tokens of unnecessary MCP schema per call is burning 60K tokens on context noise.

Takeaways

What cost the most time: Understanding what claude -p actually returns. The docs describe it as "run a prompt and get the result" — which is accurate, but buries the key limitation: "the result" means the final text of the final turn, not the accumulated output of all turns. This isn't a bug, it's a design choice for single-turn use. The SDK is the intended tool for anything more complex.

Transferable diagnostics:

Subprocess output is incomplete or missing tool call results → you're using claude -p and the intermediate turns aren't returned. Migrate to the SDK.
Agent ignores new system prompt despite fresh run → session registry has a stale ID. Delete the registry file.
MCP tool schemas appearing in agent context when they shouldn't → setting_sources is loading ~/.claude/CLAUDE.md which references the MCP servers. Set setting_sources=["project", "local"].

The pattern that applies everywhere: A subprocess is the right tool for fire-and-forget. A streaming async generator is the right tool for anything that needs intermediate output, session state, or parallel composition. Choosing the wrong primitive costs significant refactor time once the system is built.

Migration Checklist

From claude -p subprocess to claude-agent-sdk:

Replace subprocess.run(["claude", "-p", task]) with async for message in query(prompt=task, options=options).
Add disallowed_tools=["Task", "ExitPlanMode", "AskUserQuestion"].
Set permission_mode="bypassPermissions" if the agent needs tool access without prompts.
Set setting_sources=["project", "local"] to avoid loading global MCP schemas.
Capture session_id from ResultMessage and store in a registry.
Pass resume=session_id for follow-up calls.
Use asyncio.gather for parallel agents, asyncio.to_thread + Lock for interactive prompts.

Also in this series: The Debate System — Multiple Claude Instances as Peer Reviewers · Mandatory Instructions That Actually Stick