Microbenchmarking is a widely used form of performance testing in Java software. A microbenchmark repeatedly executes a small chunk of code while collecting measurements related to its performance. Due to Java Virtual Machine optimizations, microbenchmarks are usually subject to severe performance fluctuations in the first phase of their execution (also known as warmup). For this reason, software developers typically discard measurements of this phase and focus their analysis when benchmarks reach a steady state of performance. Developers estimate the end of the warmup phase based on their expertise, and configure their benchmarks accordingly. Unfortunately, this approach is based on two strong assumptions: (i) benchmarks always reach a steady state of performance and (ii) developers accurately estimate warmup. In this paper, we show that Java microbenchmarks do not always reach a steady state, and often developers fail to accurately estimate the end of the warmup phase. We found that a considerable portion of studied benchmarks do not hit the steady state, and warmup estimates provided by software developers are often inaccurate (with a large error). This has significant implications both in terms of results quality and time-effort. Furthermore, we found that dynamic reconfiguration significantly improves warmup estimation accuracy, but still it induces suboptimal warmup estimates and relevant side-effects. We envision this paper as a starting point for supporting the introduction of more sophisticated automated techniques that can ensure results quality in a timely fashion.
翻译:微分标记是爪哇软件中广泛使用的性能测试形式。 一个微分标记在收集与其性能有关的测量时反复执行一小块代码。 由于爪哇虚拟机器优化,微分标记在执行的第一阶段通常会发生严重性能波动(也称为暖化 ) 。 因此,软件开发者通常放弃对这个阶段的测量,在基准达到稳定性能状态时集中分析。 开发者根据其专长估计暖化阶段的结束, 并相应地配置基准。 不幸的是, 这种方法基于两个强有力的假设:(一) 基准总是达到稳定的性能状态, (二) 开发者准确估计热度。 在本文中,我们显示, 微分标记在执行的第一阶段通常不会达到稳定性能波动, 开发者往往无法准确估计暖化阶段的结束。 我们发现,相当一部分的研究基准并不触动稳定状态,而软件开发者提供的热度估计往往不准确(大错 )。 这在结果质量和时间精确度两方面都具有重大影响。 此外,我们发现, 动态的精确性调整方法可以大大地改进这一精确性的文件。