Code generation is widely regarded as a key technique for elevating the automation and ultimate quality of software development. Nevertheless, existing code generation approaches usually concentrate on a single stage of the software development process (i.e., the coding stage) and do not take into consideration other stages that are crucial in reducing complexity and ensuring quality assurance. The organization and conduction of multiple stages in software development require collaborative teamwork. To this end, this paper presents a self-collaboration code generation framework employing large language models (LLMs), exemplified by ChatGPT. Specifically, multiple LLMs play distinct roles through role instructions to form teams, addressing code generation tasks collaboratively and interactively without the need for human intervention. To showcase our framework, we assemble an elementary team consisting of three ChatGPT roles (i.e., analyst, coder, and tester) corresponding to the analysis, coding, and testing stages of software development. We conduct comprehensive experiments on various code-generation benchmarks. The experimental results indicate that self-collaboration code generation improves 29.9%-47.1% relative performance compared to naive direct code generation, achieving state-of-the-art performance and even surpassing GPT-4.
翻译:代码生成被广泛认为是提升软件开发自动化和最终质量的关键技术。然而,现有的代码生成方法通常只集中于软件开发流程中的单个阶段(即编码阶段),而不考虑其他关键阶段,如降低复杂性和确保质量保障等。多个开发阶段的组织和实施需要协作团队的配合。因此,本文提出了一种利用大型语言模型(LLMs)的自我协作代码生成框架,以ChatGPT为例。具体而言,多个LLMs通过角色指令扮演不同角色,形成团队,无需人类干预,协作进行交互式代码生成任务。为了展示我们的框架,我们组建了由三个ChatGPT角色(即分析师、程序员和测试员)组成的简单团队,对各种代码生成基准进行了全面实验。实验结果表明,自我协作代码生成相对于简单的直接代码生成可以提高29.9%-47.1%的性能表现,实现了最新水平,甚至超过了GPT-4。