Single-Agent versus Multi-Agent LLM Systems for Automated Programming: A Controlled Experiment in Software Engineering
Submitted in Transactions on Software Engineering and Methodology, 2025
Recommended citation: L Yu, E Alégroth, P Chatzipetrou, T Gorschek (2025). "Single-Agent versus Multi-Agent LLM Systems for Automated Programming: A Controlled Experiment in Software Engineering." Transactions on Software Engineering and Methodology.
Large Language Models (LLMs) can produce working code, but complex tasks still demand decomposition, planning, and self-checking. Multi-agent systems address this by assigning role-specific prompts and using coordinators to route messages and invoke tools. Evidence from comparisons against single-agent setups is limited. We ran a controlled experiment on 133 LeetCode problems to test whether a multi-agent system beats a single agent on code quality. Agents were a Problem Analyzer, Solution Designer, Code Executor, and Solution Verifier. We also compared both LLM systems with the LeetCode human baseline. Metrics were acceptance rate, cyclomatic complexity, lines of code, and generation time. We replicated the experiment three times. Average acceptance rate was 98.50% (single-agent) and 97.74% (multi-agent), and both exceeded the LeetCode human baseline of 55.99%. The multi-agent system reduced mean cyclomatic complexity by 4.8% and lines of code by 3.6%. These gains came with a fourfold increase in end-to-end generation time. Quality gains from multiple agents were small and did not offset added cost (longer generation time, higher token usage, more API calls). For LeetCode-style coding tasks where fast turnaround matters, we recommend defaulting to a single agent and adopting multiple agents when specific quality targets (e.g., accuracy) justify the cost.
