PALM：融合程序分析与大语言模型以提升Rust单元测试覆盖率 (PALM: Synergizing Program Analysis and LLMs to Enhance Rust Unit Test Coverage)

Unit testing is essential for ensuring software reliability and correctness. Classic Search-Based Software Testing (SBST) methods and concolic execution-based approaches for generating unit tests often fail to achieve high coverage due to difficulties in handling complex program units, such as branching conditions and external dependencies. Recent work has increasingly utilized large language models (LLMs) to generate test cases, improving the quality of test generation by providing better context and correcting errors in the model's output. However, these methods rely on fixed prompts, resulting in relatively low compilation success rates and coverage. This paper presents PALM, an approach that leverages large language models (LLMs) to enhance the generation of high-coverage unit tests. PALM performs program analysis to identify branching conditions within functions, which are then combined into path constraints. These constraints and relevant contextual information are used to construct prompts that guide the LLMs in generating unit tests. We implement the approach and evaluate it in 15 open-source Rust crates. Experimental results show that within just two or three hours, PALM can significantly improve test coverage compared to classic methods, with increases in overall project coverage exceeding 50% in some instances and its generated tests achieving an average coverage of 72.30%, comparable to human effort (70.94%), highlighting the potential of LLMs in automated test generation. We submitted 91 PALM-generated unit tests targeting new code. Of these submissions, 80 were accepted, 5 were rejected, and 6 remain pending review. The results demonstrate the effectiveness of integrating program analysis with AI and open new avenues for future research in automated software testing.

翻译：单元测试对于确保软件可靠性与正确性至关重要。传统的基于搜索的软件测试（SBST）方法以及基于混合执行（concolic execution）的单元测试生成技术，由于难以处理复杂程序单元（如分支条件和外部依赖），往往无法实现高覆盖率。近期研究越来越多地利用大语言模型（LLMs）生成测试用例，通过提供更优的上下文并修正模型输出中的错误，提升了测试生成的质量。然而，这些方法依赖固定提示模板，导致编译成功率和覆盖率相对较低。本文提出PALM方法，其利用大语言模型来增强高覆盖率单元测试的生成。PALM通过程序分析识别函数内的分支条件，并将其组合为路径约束。这些约束及相关上下文信息被用于构建提示，以引导大语言模型生成单元测试。我们实现了该方法，并在15个开源Rust crate中进行了评估。实验结果表明，仅在两到三小时内，PALM相比传统方法能显著提升测试覆盖率，部分项目的整体覆盖率提升超过50%，其生成的测试平均覆盖率达到72.30%，与人工编写测试的覆盖率（70.94%）相当，凸显了大语言模型在自动化测试生成中的潜力。我们提交了91个由PALM生成的针对新代码的单元测试。在这些提交中，80个被接受，5个被拒绝，6个仍在评审中。该结果证明了程序分析与人工智能相结合的有效性，并为自动化软件测试的未来研究开辟了新途径。