Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. Large language models (LLMs) address this limitation by leveraging their extensive data-driven knowledge of code semantics and programming patterns. To analyze the state of the art in this domain, we conducted a systematic literature review of 115 publications published between May 2021 and August 2025. We propose a taxonomy based on the unit test generation lifecycle that divides the process into a generative phase for creating test artifacts and a quality assurance phase for refining them. Our analysis reveals that prompt engineering has emerged as the dominant utilization approach and accounts for 89% of the studies due to its flexibility. We find that iterative validation and repair loops have become the standard mechanism to ensure robust usability by significantly improving compilation and execution pass rates. However, critical challenges remain regarding the weak fault detection capabilities and the lack of standardized benchmarks. We conclude with a roadmap for future research that emphasizes the progression toward autonomous testing agents and hybrid systems combining LLMs with traditional software engineering tools.
翻译:自动化单元测试生成对于软件质量至关重要,但传统的结构驱动方法往往缺乏生成真实输入与断言所需的语义理解能力。大语言模型(LLMs)通过利用其基于海量数据驱动的代码语义和编程模式知识,有效解决了这一局限。为分析该领域的研究现状,我们对2021年5月至2025年8月期间发表的115篇文献进行了系统性综述。我们提出了一个基于单元测试生成生命周期的分类框架,将生成过程划分为创建测试制品的生成阶段与优化制品的质量保障阶段。分析表明,提示工程因其灵活性已成为主导应用方式,在相关研究中占比达89%。研究发现,迭代验证与修复循环已成为确保生成结果稳健可用的标准机制,能显著提升编译与执行通过率。然而,当前仍存在故障检测能力薄弱、缺乏标准化基准等关键挑战。最后,我们提出了未来研究的路线图,强调应朝着自主测试代理、以及大语言模型与传统软件工程工具相结合的混合系统方向发展。