Ensuring that software performance does not degrade after a code change is paramount. A potential solution, particularly for libraries and frameworks, is regularly executing software microbenchmarks, a performance testing technique similar to (functional) unit tests. This often becomes infeasible due to the extensive runtimes of microbenchmark suites, however. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, reducing their potential application drastically. In this paper, we propose a search-based technique based on multi-objective evolutionary algorithms (MOEAs) to improve the current state of microbenchmark prioritization. The technique utilizes three objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that our technique improves over the best coverage-based, greedy baselines in terms of average percentage of fault-detection on performance (APFD-P) and Top-3 effectiveness by 26 percentage points (pp) and 43 pp (for Additional) and 17 pp and 32 pp (for Total) to 0.77 and 0.24, respectively. Employing the Indicator-Based Evolutionary Algorithm (IBEA) as MOEA leads to the best effectiveness among six MOEAs. Finally, the technique's runtime overhead is acceptable at 19% of the overall benchmark suite runtime, if we consider the enormous runtimes often spanning multiple hours. The added overhead compared to the greedy baselines is miniscule at 1%.These results mark a step forward for universally applicable performance regression testing techniques.
翻译:确保软件性能不会在改变代码后下降至关重要。 一种潜在的解决方案,特别是对于图书馆和框架来说,是定期执行软件微基准,这是一种类似于(功能)单位测试的性能测试技术。 但是,由于微基准套件的运行时间很长,这往往变得不可行。 为了应对这一挑战,研究研究了回归测试测试技术,例如测试案例优先排序(TCP),在微基准套件内重新排序执行,以更快地发现更大的性能变化。这种技术要么是为单位测试而设计的,在微基准上进行亚值分级,要么需要复杂的性能模型,大大降低其潜在应用。在本文件中,我们提议基于多目标进化算法(MOEAs)的搜索技术,以改善目前微基准的状态。 技术利用了三个目标,即最大限度地扩大、覆盖重叠和历史性能变化检测,以最大化。 我们认为,我们的技术改进了单位测试单位的基数基数基数、贪婪基线,或者要求采用复杂的性能模型,大幅降低其应用性能。 我们建议基于多向基线(APD-P)和Top-3的搜索技术, 将最高值测试结果评分点, 至最高值为标准,最高值为最高值为标准,最高值为标准,最高值为标准值,最高值为最低值,最高值为最高值为最低值为最低值为最低值,最高值为标准,最高值为最低值,最高值为最低值为最低值,最高值为最高值为最低值为最高,最高,最高,最高,最高值为最高值为最低值为最低值为最高值,最高值为最高值为最高值为最高,最高值为最高值,最高,最高为最高值为最高值为最高值为最高值为最高值为最高为最高为最高为最高,最高,最高,最高为最高为最高,最高,最高,最高,最高为最高为最低值为最高,最高,最高,最高为最高,最高为最高,最高为最高为最高值为最高值为最高值为最高值为最高值为最高值为最高值为最高值为最高值为最高,最高值为最高,最高,最高,最高值为最高值为最高值为最高值为最高值为最高值为最高值为最高,最高,最高,最高,最高,最高,最高,最高,