The importance of ensemble computing is well established. However, executing ensembles at scale introduces interesting performance fluctuations that have not been well investigated. In this paper, we trace our experience uncovering performance fluctuations of ensemble applications (primarily constituting a workflow of GROMACS tasks), and unsuccessful attempts, so far, at trying to discern the underlying cause(s) of performance fluctuations. Is the failure to discern the causative or contributing factors a failure of capability? Or imagination? Do the fluctuations have their genesis in some inscrutable aspect of the system or software? Does it warrant a fundamental reassessment and rethinking of how we assume and conceptualize performance reproducibility? Answers to these questions are not straightforward, nor are they immediate or obvious. We conclude with a discussion about the performance of ensemble applications and ruminate over the implications for how we define and measure application performance.
翻译:共同计算的重要性已经确立。 但是,大规模执行集合会带来尚未充分调查的令人感兴趣的绩效波动。 在本文件中,我们追踪我们发现共用应用程序(主要构成GROMACS任务工作流程)的绩效波动的经验,以及迄今试图辨别性能波动根本原因的尝试不成功。未能辨别诱因或促成因素是否是能力失灵?还是想象力?这些波动是否源于系统或软件中一些不可分的方面?是否值得从根本上重新评估和重新思考我们如何承担和构想性能可再现的绩效?对这些问题的答案不是直截了当的,也不是即时的,也不是显而易见的。我们最后要讨论共同应用的绩效,并将我们如何界定和衡量应用绩效的影响混为一谈。