AgentVet - Agents that do the actual work

Abstract

The rush to deploy multi-agent AI systems has outpaced the evidence supporting them. We looked at the three most comprehensive empirical studies available: Google Research/MIT's 260-configuration scaling analysis, Stanford's equal-budget comparison, and production economics from 47 real-world deployments. What we found challenges the prevailing narrative.

Central finding: Task decomposability, not complexity, determines parallel agent value. Parallel agents deliver up to +80.8% improvement on decomposable tasks but cause up to -70% degradation on sequential tasks. On equal compute budgets, a well-prompted single agent matches or outperforms most multi-agent architectures.

Key Findings

Decomposability, Not Complexity

Two tasks with near-identical complexity scores (0.41 vs 0.42) produced opposite results. Finance-Agent: +80.8% improvement. PlanCraft: -70% degradation. The difference: could the work be split into independent sub-problems?

Task	Complexity	Decomposability	Result
Finance-Agent	0.41	High	+80.8%
PlanCraft	0.42	Low	-70%

The 3-Agent Sweet Spot

Agents	Improvement	Cost	Efficiency
1	Baseline	1x	1.0
2	+15-30%	1.6x	0.5-0.8
3	+30-80%	2.5-3x	0.4-1.0
4-5	+5-15%	3.5-5x	0.1-0.3
6+	+2-8%	6-8x	< 0.1

The 68% Rule

Of 47 production multi-agent deployments analyzed, 68% were over-engineered. A well-architected single agent delivers 92% of results at 28% of the cost.

100K Token Threshold

Parallel agents become valuable above ~100K tokens and degrade results below ~50K tokens. Below 10K tokens, coordination overhead alone exceeds any information gain.

Error Amplification

Architecture	Error Amplification
Single agent	1x (baseline)
Independent MAS (no merge)	17.2x
Centralized MAS	4.4x

Decision Tree

START: What are you trying to do?
|
+-- SIMPLE LOOKUP or Q&A?
|   +-- SINGLE AGENT (1x cost, 92%+ quality)
|
+-- SEQUENTIAL REASONING (A then B then C)?
|   +-- Try SAS-L first (matches MAS at 1x cost)
|      If not possible: SINGLE AGENT
|
+-- SINGLE SOURCE < 100K tokens?
|   +-- SINGLE AGENT (context window sufficient)
|
+-- PARALLELIZABLE, READ-HEAVY, MULTIPLE SOURCES?
|   +-- 100K-500K tokens -> 2-3 PARALLEL AGENTS
|   +-- > 500K tokens -> 3-4 PARALLEL AGENTS
|   +-- Need speed? -> 2-3 PARALLEL (cuts wall-clock)
|
+-- BROAD RESEARCH (multi-domain)?
|   +-- ONYX PATTERN: 3 agents x up to 8 cycles
|      Never deeper than 2 levels
|
+-- ENSEMBLE VOTING (classification)?
    +-- 3-5 agents with WEIGHTED VOTING

Merge Strategies

Strategy	Best For	vs Single Agent
Independent (no merge)		-70% worst case
Union (take all)	Exploration	+10-30%
Weighted voting	Classification	+35%
Orchestrator synthesis	Research, strategy	+80.8% best

Cost Analysis

Metric	Single Agent	Multi-Agent (3)	Multiplier
Infrastructure	$8,200/mo	$12,400/mo	1.5x
Token costs	$180/mo	$780/mo	4.3x
Total TCO	$8,380/mo	$13,180/mo	1.57x

Quick Reference

USE PARALLEL AGENTS WHEN:
   - Task splits into independent sub-problems
   - Input > 100K tokens or multiple documents
   - Quality-sensitive (missing findings is expensive)
   - Breadth-first exploration needed

USE SINGLE AGENT WHEN:
   - Sequential reasoning chain (A then B then C)
   - Small input (< 50K tokens)
   - Cost-constrained
   - Simple lookup / Q&A

OPTIMAL SETTINGS:
   - Agents: start with 2, rarely exceed 3
   - Architecture: centralized, 2 levels max
   - Merge: orchestrator synthesis for research
   - Budget: 2.5-3x tokens for 30-80% quality gain
   - First try: SAS-L before adding agents

Conclusion

Parallel agents are a powerful tool, but not a default architecture. They win when tasks are decomposable, read-heavy, and quality-sensitive. They lose when tasks are sequential, tightly-coupled, or cost-constrained.

The optimal architecture for most tasks is 2-3 parallel agents with a centralized orchestrator, never deeper than 2 levels. Before adding agents, try increasing the reasoning budget of a single agent. It often delivers comparable quality at a fraction of the cost.

The best multi-agent system is the one you didn't build, because a single agent was already enough.

References

Kim et al. "Towards a Science of Scaling Agent Systems" Google Research/MIT, arXiv 2512.08296 (2025)
Tran & Kiela "Single-Agent vs Multi-Agent Under Equal Thinking Token Budgets" Stanford, arXiv 2604.02460 (2026)
UIUC Token Cost Study, arXiv 2505.18286 (2025)
Databricks State of AI Agents Report (2026)
Onyx Deep Research, open source, onyx.app
Iterathon Multi-Agent Economics, 47 production deployments
Microsoft Azure SRE Agent, reversed multi-agent to single-agent
Cursor 2.0 Parallel Agents, 8 agents, 25-35% token premium
Gartner, 33% of enterprise apps will include agentic AI by 2028