The amount of data generated by numerical simulations in various scientific domains such as molecular dynamics, climate modeling, biology, or astrophysics, led to a fundamental redesign of application workflows. The throughput and the capacity of storage subsystems have not evolved as fast as the computing power in extreme-scale supercomputers. As a result, the classical post-hoc analysis of simulation outputs became highly inefficient. In-situ workflows have then emerged as a solution in which simulation and data analytics are intertwined through shared computing resources, thus lower latencies. Determining the best allocation, i.e., how many resources to allocate to each component of an in-situ workflow; and mapping, i.e., where and at which frequency to run the data analytics component, is a complex task whose performance assessment is crucial to the efficient execution of in-situ workflows. However, such a performance evaluation of different allocation and mapping strategies usually relies either on directly running them on the targeted execution environments, which can rapidly become extremely time-and resource-consuming, or on resorting to the simulation of simplified models of the components of an in-situ workflow, which can lack of realism. In both cases, the validity of the performance evaluation is limited. To address this issue, we introduce SIM-SITU, a framework for the faithful simulation of in-situ workflows. This framework builds on the SimGrid toolkit and benefits of several important features of this versatile simulation tool. We designed SIM-SITU to reflect the typical structure of in-situ workflows and thanks to its modular design, SIM-SITU has the necessary flexibility to easily and faithfully evaluate the behavior and performance of various allocation and mapping strategies for in-situ workflows. We illustrate the simulation capabilities of SIM-SITU on a Molecular Dynamics use case. We study the impact of different allocation and mapping strategies on performance and show how users can leverage SIM-SITU to determine interesting tradeoffs when designing their in-situ workflow.
翻译:分子动态、气候模型、生物学或天体物理学等各种科学领域的数字模拟所产生的数据数量导致对应用工作流程进行根本重新设计。存储子系统的输送量和能力没有在极端规模的超级计算机中的计算能力那样迅速发展。结果,模拟产出的典型后热分析变得非常低效。随后,我们现场工作流程作为一种解决方案,通过共享计算资源,使模拟和数据分析分析相互交织,从而减少了迟滞。确定最佳分配,即分配给现场工作流程每个组成部分的资源有多少;以及绘图,即运行数据分析元件的频率没有在极端规模的超级计算机中发生。结果,模拟产出的典型后热分析变得非常低。然而,对不同配置和绘图战略的绩效评价通常取决于对目标执行环境的直接操作,这可能会迅速变得耗时和资源消耗,或者利用简化的当前流程流程流程中每个组成部分的分配;SIM的当前流程配置和SIM的当前流程框架的精确性能,可以显示SIM的准确性能;SIM的当前流程中,SIM的当前流程的运行状况,以及SIM的当前流程的运行能力,可以显示其真实性能的流程。