Multi-agent orchestration went GA in Summer '26. The documentation covers the API surface well enough. What it doesn't cover is why the patterns that look clean in a sandbox fall apart under real load — and which architectural decisions you'll regret at 3 AM when the on-call alert fires.
This post assumes you've already shipped at least one Agentforce implementation. If you're still asking "what is an agent," start there first.
The Three Orchestration Patterns
Every production multi-agent architecture maps to one of these patterns, or an explicit combination. The choice drives latency, cost, and failure complexity.
Coordinator Pattern
One agent receives the incoming request, decomposes it into subtasks, delegates to specialized agents, and assembles the final response. The coordinator knows the full context; the workers don't need to.
This is the right choice when subtasks have dependencies — when Agent B's input depends on Agent A's output. The coordinator enforces sequencing.
The latency cost is real: every agent hop adds 1–3 seconds of model inference plus Salesforce's internal routing overhead. A chain of five sequential agents in a customer-facing flow will miss most acceptable SLA targets. Design the chain depth deliberately.
Use when: tasks decompose into sequential steps with hard dependencies, or when you need a single point of accountability for the final response quality.
Parallel Executor
The orchestrator fans out to multiple agents simultaneously and merges results. Total latency approaches the slowest individual agent rather than the sum of all agents.
The performance argument is real. The architectural risk is equally real: parallel agents operate without shared state. If Agent A and Agent B both need to read the same record and one of them also writes to it, you have a race. The platform does not prevent this. You have to design idempotent agents and use record locking when write operations are in scope.
A second failure mode is context coherence loss. Each parallel branch gets a copy of the context at fan-out time. If the context window is large — say, a 200-message conversation history — you're passing that full context to every branch. Token costs scale linearly. More importantly, if your orchestrator is careless about what context each agent actually needs, you will hit the 32K token limit on downstream agents and get truncation errors that are difficult to reproduce.
Use when: subtasks are truly independent, agents are read-mostly, and you've explicitly scoped what context each branch needs.
Judge-Jury Pattern
Multiple agents independently produce a result; a final "judge" agent evaluates the candidates and selects or synthesizes the best output. Common in content generation, classification, and any scenario where output quality variance is unacceptable.
The honest trade-off: you're running N+1 model calls for every request. Cost scales linearly. Latency adds the judge hop on top of the slowest candidate — it doesn't quite multiply if candidates run in parallel, but the overhead is real. Quality improvement is also real but not always proportional to the overhead — in my experience, for structured tasks (data extraction, record updates), a well-prompted single agent with few-shot examples matches or outperforms an ensemble at a fraction of the cost.
Where ensembles genuinely earn their overhead is open-ended reasoning tasks where single-model hallucination risk is high. A three-agent ensemble with a judge catches a meaningful percentage of confident-but-wrong outputs that a single agent would pass through unchallenged.
Use when: output quality variance is a business risk, tasks are open-ended enough that multiple valid approaches exist, and the latency + cost budget supports it.
When Ensembles Are Not Better
This deserves a direct statement because the marketing around multi-agent systems implies more agents always means better results.
For short, well-scoped tasks (look up an account, update a field, send a notification), adding agents adds latency and failure surface with no quality benefit. Every additional model hop is a new place where the context can get misinterpreted, the tool call can fail, or the response can be malformed.
For tasks involving long conversation history, parallel agents each receiving the full context will produce coherent individual outputs but the orchestrator has to reconcile N perspectives that were generated without awareness of each other. In practice this creates subtle inconsistencies in the merged response that are hard to detect and hard to explain to users.
The right question isn't "how many agents can I add?" It's "what is the minimum number of agents that keeps concerns cleanly separated?"
What Breaks Silently in Production
These are the failure modes I've seen in production implementations that didn't show up in sandbox testing.
Shared state races in parallel execution. Two agents both update the same related record in the same transaction window. The second write wins silently. No error is thrown. The first agent's work is overwritten. This is particularly dangerous in financial or compliance contexts where every change needs to be intentional.
Stale data passed to downstream agents. The coordinator fetches CRM context at request start. A parallel agent running 8 seconds later receives that context snapshot. If another process (a human, a Flow, another agent) has modified the underlying records in that window, the downstream agent is making decisions on stale data. At low volume this is invisible. At scale, it surfaces as mysterious inconsistencies.
Context window overflow without a hard error. Agents approaching the token limit start silently truncating the oldest context. The model continues operating on an incomplete picture. This does not throw an exception. The first sign is typically degraded output quality or agents "forgetting" earlier instructions, which is genuinely hard to diagnose in a chain.
Silent AWU exhaustion. If an agent chain hits the AWU limit for the org or for the current entitlement, it doesn't always fail loudly. Depending on where in the chain the limit is reached, you can get partial execution — some records updated, some not — with no clear error surface. See Post 2 in this series for AWU governance in depth.
Latency Architecture
For customer-facing workflows, the practical ceiling is around 8–10 seconds total before users start abandoning. For backend automation it's more forgiving, but downstream systems have their own timeouts.
A few rules that have held up in practice:
Keep the coordinator lightweight. Its job is decomposition and assembly, not reasoning. If your coordinator agent is doing complex analysis, that work should be in a specialized agent it delegates to.
Parallelize aggressively within a stage, minimize stages. One orchestration stage with three parallel agents is almost always faster than three sequential stages with one agent each. Model the dependency graph explicitly.
Cache CRM context at the coordinator level and pass references, not full data. Agents that need account data should receive the account ID and fetch what they need, not receive a 50-field JSON blob that came from the coordinator's initial lookup. This also reduces token pressure across the chain.
When mixing CRM data with external API calls in the same chain, design the API callout agents to have explicit timeouts and fallback behaviors. An external API timing out at 29 seconds will not respect your 10-second SLA. The Salesforce callout limit of 120 seconds applies per transaction, not per agent — but a hung external call blocks the entire chain.
What Isn't Ready Yet
A few honest limitations of the GA release as of Summer '26.
Cross-agent memory is still per-session. There is no built-in mechanism for agents to share persistent state between sessions. If you need Agent A's findings from Monday to be available to Agent B on Tuesday, you're building that persistence layer yourself — typically through CRM records or Custom Metadata.
Agent-to-agent trust model is coarse. When a coordinator delegates to a worker agent, the worker runs with the coordinator's permissions. There's no fine-grained capability scoping at the agent-to-agent call level. For privilege-sensitive operations, this means you can't easily create a coordinator that has broad read access but narrow write access for specific workers.
Observability tooling is still catching up. The native Agentforce audit logs give you request/response at the agent boundary, but tracing a decision through a multi-agent chain — understanding why a specific worker produced a specific output — requires instrumenting your own metadata. This is the operability gap that will bite you first in production.
None of these are blockers, but they're constraints to design around rather than assume away.
Next in this series: AWU cost governance, audit trail design, and rollback strategies for agent chains that modify records at scale.
Questions or war stories from your own implementations? LinkedIn.