检测具有深层图像理解和文本分析的移动应用人群源测试报告的一致性 (Detecting Crowdsourced Test Report Consistency for Mobile Apps with Deep Image Understanding and Text Analysis)

Crowdsourced testing, as a distinct testing paradigm, has attracted much attention in software testing, especially in mobile application (app) testing field. Compared with in-house testing, crowdsourced testing outperforms because it utilize the diverse testing environments of different crowdworkers faced with the mobile testing fragmentation problem. However, crowdsourced testing also brings some problem. The crowdworkers involved are with different expertise, and they are not professional testers. Therefore, the reports they may submit are numerous and with uneven quality. App developers have to distinguish high-quality reports from low-quality ones to help the bug revealing and fixing. Some crowdworkers would submit inconsistent test reports, which means the textual descriptions are not focusing on the attached bug occurring screenshots. Such reports cause the waste on both time and human resources of app developing and testing. To solve such a problem, we propose ReCoDe in this paper, which is designed to detect the consistency of crowdsourced test reports via deep image-and-text fusion understanding. First, according to a pre-conducted survey, ReCoDe classifies the crowdsourced test reports into 10 categories, which covers the vast majority of reported problems in the test reports. Then, for each category of bugs, we have distinct processing models. The models have a deep fusion understanding on both image information and textual descriptions. We also have conducted an experiment to evaluate ReCoDe, and the results show the effectiveness of ReCoDe to detect consistency crowdsourced test reports.

翻译：众包测试作为一种截然不同的测试模式,在软件测试中,特别是在移动应用程序(应用程序)测试场中,吸引了许多关注。与内部测试相比,众包测试表现优于功能,因为它利用了面临移动测试碎裂问题的不同人群工人的不同测试环境。然而,众包测试也带来了一些问题。参与的人群工人具有不同的专长,他们不是专业测试者。因此,他们可能提交的报告数量众多,质量不一。应用程序开发者必须区分高质量的报告和低质量报告,以帮助错误的发现和修补。一些人群工人会提交不一致的测试报告,这意味着文字描述的重点不是随附的错误发生截图。这些报告在应用开发和测试的时间和人力资源方面造成浪费。为了解决这个问题,我们建议本文中的ReCoDe, 目的是通过深层次的图像和文字融合理解来检测众包测试报告的一致性。首先,根据事先开展的调查,ReCode将众包测试报告分为10个类别,这意味着文本描述的重点不是放在随附的错误的截图中。我们随后所报告的不同测试的样本中的大多数。