GUI is a bridge connecting user and application. Existing GUI testing tasks can be categorized into two groups: functionality testing and compatibility testing. While the functionality testing focuses on detecting application runtime bugs, the compatibility testing aims at detecting bugs resulting from device or platform difference. To automate testing procedures and improve testing efficiency, previous works have proposed dozens of tools. To evaluate these tools, in functionality testing, researchers have published testing benchmarks. Comparatively, in compatibility testing, the question of ``Do existing methods indeed effectively assist test cases replay?'' is not well answered. To answer this question and advance the related research in GUI compatibility testing, we propose a benchmark of GUI compatibility testing. In our experiments, we compare the replay success rate of existing tools. Based on the experimental results, we summarize causes which may lead to ineffectiveness in test case replay and propose opportunities for improving the state-of-the-art.
翻译:GUI 是连接用户和应用程序的桥梁。 现有的 GUI 测试任务可以分为两类: 功能测试和兼容性测试。 虽然功能测试侧重于检测应用程序运行时的错误, 兼容性测试的目的是检测设备或平台差异产生的错误。 为了自动测试程序并提高测试效率, 先前的工程提出了数十个工具。 为了评估这些工具, 在功能测试中, 研究人员公布了测试基准。 比较而言, 兼容性测试中, “ 现有方法是否确实有效地帮助测试案例重现? ” 的问题没有得到很好回答。 为了回答这个问题并推进界面兼容性测试的相关研究, 我们提出了一个图形兼容性测试的基准。 在实验中, 我们比较了现有工具的重现成功率。 根据实验结果, 我们总结了可能导致测试案例重播无效的原因, 并提出改进最新技术的机会 。