SAINT：基于程序分析与LLM智能体的服务级集成测试生成方法 (SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents)

Enterprise applications are typically tested at multiple levels, with service-level testing playing an important role in validating application functionality. Existing service-level testing tools, especially for RESTful APIs, often employ fuzzing and/or depend on OpenAPI specifications which are not readily available in real-world enterprise codebases. Moreover, these tools are limited in their ability to generate functional tests that effectively exercise meaningful scenarios. In this work, we present SAINT, a novel white-box testing approach for service-level testing of enterprise Java applications. SAINT combines static analysis, large language models (LLMs), and LLM-based agents to automatically generate endpoint and scenario-based tests. The approach builds two key models: an endpoint model, capturing syntactic and semantic information about service endpoints, and an operation dependency graph, capturing inter-endpoint ordering constraints. SAINT then employs LLM-based agents to generate tests. Endpoint-focused tests aim to maximize code and database interaction coverage. Scenario-based tests are synthesized by extracting application use cases from code and refining them into executable tests via planning, action, and reflection phases of the agentic loop. We evaluated SAINT on eight Java applications, including a proprietary enterprise application. Our results illustrate the effectiveness of SAINT in coverage, fault detection, and scenario generation. Moreover, a developer survey provides strong endorsement of the scenario-based tests generated by SAINT. Overall, our work shows that combining static analysis with agentic LLM workflows enables more effective, functional, and developer-aligned service-level test generation.

翻译：企业级应用通常需进行多层级测试，其中服务级测试在验证应用功能方面发挥着重要作用。现有的服务级测试工具（特别是针对RESTful API的工具）常采用模糊测试方法，且/或依赖OpenAPI规范，而此类规范在实际企业代码库中往往难以直接获取。此外，这些工具在生成能有效执行有意义场景的功能性测试方面存在局限。本研究提出SAINT——一种针对企业级Java应用服务级测试的创新白盒测试方法。SAINT融合静态分析、大语言模型（LLMs）及基于LLM的智能体，以自动生成端点测试与场景化测试。该方法构建了两个关键模型：端点模型（捕获服务端点的语法与语义信息）和操作依赖图（刻画端点间的顺序约束关系）。SAINT随后调用基于LLM的智能体生成测试：聚焦端点的测试旨在最大化代码与数据库交互的覆盖率；场景化测试则通过从代码中提取应用用例，并借助智能体循环中的规划、执行与反思三个阶段将其精炼为可执行测试。我们在八个Java应用（包括一个专有企业级应用）上评估了SAINT。实验结果表明，SAINT在覆盖率、缺陷检测与场景生成方面均表现出色。此外，开发者调研结果高度肯定了SAINT生成的场景化测试质量。总体而言，本研究表明，将静态分析与基于LLM的智能体工作流相结合，能够实现更高效、功能更完备且更贴合开发者需求的服务级测试生成。