The assurance of mobile app GUIs has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often missing critical functionalities. Inspired by manual testing, which treats business logic-driven scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models to comprehend GUI semantics and contextual relevance to given scenarios. Building on this capability, we propose ScenGen, an LLM-guided scenario-based GUI testing framework employing multi-agent collaboration to simulate and automate manual testing phases. Specifically, ScenGen integrates five agents: the Observer, Decider, Executor, Supervisor, and Recorder. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, interpreting semantic information. This is passed to the Decider, which makes scenario-driven decisions with LLM guidance to identify target widgets and determine actions toward fulfilling specific goals. The Executor performs these operations, while the Supervisor verifies alignment with intended scenario completion, ensuring traceability and consistency. Finally, the Recorder logs GUI operations into context memory as a knowledge base for subsequent decision-making and monitors runtime bugs. Comprehensive evaluations demonstrate that ScenGen effectively generates scenario-based GUI tests guided by LLM collaboration, achieving higher relevance to business logic and improving the completeness of automated GUI testing.
翻译:移动应用图形用户界面(GUI)的可靠性保障日益重要,因为GUI是用户与应用交互的主要媒介。尽管已有多种自动化GUI测试方法采用不同策略,但这些方法与底层应用业务逻辑之间仍存在显著差距。现有方法大多侧重于通用探索而非完成特定测试场景,常遗漏关键功能。受人工测试将业务逻辑驱动的场景作为测试基本单元的启发,本文提出一种利用大语言模型理解GUI语义及其与给定场景上下文相关性的方法。基于此能力,我们提出了ScenGen——一个基于大语言模型引导的多智能体协作场景化GUI测试框架,用于模拟并自动化人工测试流程。具体而言,ScenGen集成五个智能体:观察器、决策器、执行器、监督器和记录器。观察器通过提取并结构化GUI控件与布局来感知应用界面状态,解析语义信息;这些信息传递给决策器,后者在大语言模型引导下进行场景驱动决策,识别目标控件并确定实现特定目标的操作;执行器执行这些操作,而监督器验证操作与预期场景完成度的一致性,确保可追溯性与连贯性;最后,记录器将GUI操作记录至上下文记忆库作为后续决策的知识基础,并监测运行时错误。综合评估表明,ScenGen能通过大语言模型协作有效生成场景化GUI测试,显著提升与业务逻辑的相关性,并增强自动化GUI测试的完整性。