Java GitHub 开放源码项目中自动自动测试案例识别 (Automating Test Case Identification in Java Open Source Projects on GitHub)

Software testing is one of the very important Quality Assurance (QA) components. A lot of researchers deal with the testing process in terms of tester motivation and how tests should or should not be written. However, it is not known from the recommendations how the tests are written in real projects. In this paper, the following was investigated: (i) the denotation of the word "test" in different natural languages; (ii) whether the number of occurrences of the word "test" correlates with the number of test cases; and (iii) what testing frameworks are mostly used. The analysis was performed on 38 GitHub open source repositories thoroughly selected from the set of 4.3M GitHub projects. We analyzed 20,340 test cases in 803 classes manually and 170k classes using an automated approach. The results show that: (i) there exists a weak correlation (r = 0.655) between the number of occurrences of the word "test" and the number of test cases in a class; (ii) the proposed algorithm using static file analysis correctly detected 97% of test cases; (iii) 15% of the analyzed classes used main() function whose represent regular Java programs that test the production code without using any third-party framework. The identification of such tests is very complex due to implementation diversity. The results may be leveraged to more quickly identify and locate test cases in a repository, to understand practices in customized testing solutions, and to mine tests to improve program comprehension in the future.

翻译：软件测试是十分重要的质量保证(QA)组成部分之一。许多研究人员在测试动机和如何进行测试或不应进行测试方面处理测试过程,然而,从建议中还不知道测试是如何在实际项目中写成的。在本文中,调查了以下内容:(一) “测试”一词以不同自然语言的注释;(二) “测试”一词的发生次数是否与测试案例数量相关;(三) 测试框架是否大多使用。在从4.3M GitHub项目组中彻底选定的38个GitHub开放源库中进行了分析。我们用自动方法对803个类和170k类测试的20 340个测试案例进行了分析。结果显示:(一) “测试”一词的发生次数与某一类测试案例的测试案件数量之间存在薄弱的关联(r=0.655);以及(二) 使用固定文件分析正确检测97%的测试案例的拟议算法。 (三) 在使用主要测试的分析报告班级中,15 % 使用主测试功能代表常规测试的标准化测试。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

【干货书】Pytorch自然语言处理，210页pdf

专知会员服务

166+阅读 · 2020年10月30日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Python计算导论，560页pdf，Introduction to Computing Using Python

专知会员服务

75+阅读 · 2020年5月5日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日