Benchmarks and performance experiments are frequently conducted in cloud environments. However, their results are often treated with caution, as the presumed high variability of performance in the cloud raises concerns about reproducibility and credibility. In a recent study, we empirically quantified the impact of this variability on benchmarking results by repeatedly executing a stream processing application benchmark at different times of the day over several months. Our analysis confirms that performance variability is indeed observable at the application level, although it is less pronounced than often assumed. The larger scale of our study compared to related work allowed us to identify subtle daily and weekly performance patterns. We now extend this investigation by examining whether a major global event, such as Black Friday, affects the outcomes of performance benchmarks.
翻译:基准测试与性能实验常在云环境中进行。然而,其结果往往被谨慎对待,因为云环境中性能的高变异性引发了关于可复现性与可信度的担忧。在最近的一项研究中,我们通过数月内在一天的不同时段重复执行流处理应用基准测试,实证量化了这种变异性对基准测试结果的影响。我们的分析证实,性能变异性确实在应用层面可被观测到,尽管其程度通常低于普遍假设。与相关研究相比,我们更大规模的研究使我们能够识别出细微的每日与每周性能模式。现在,我们通过考察重大全球事件(如黑色星期五)是否会影响性能基准测试的结果,来扩展这项研究。