Scientific workflows are a cornerstone of modern scientific computing. They are used to describe complex computational applications that require efficient and robust management of large volumes of data, which are typically stored/processed on heterogeneous, distributed resources. The workflow research and development community has employed a number of methods for the quantitative evaluation of existing and novel workflow algorithms and systems. In particular, a common approach is to simulate workflow executions. In previous works, we have presented a collection of tools that have been adopted by the community for conducting workflow research. Despite their popularity, they suffer from several shortcomings that prevent easy adoption, maintenance, and consistency with the evolving structures and computational requirements of production workflows. In this work, we present WfCommons, a framework that provides a collection of tools for analyzing workflow executions, for producing generators of synthetic workflows, and for simulating workflow executions. We demonstrate the realism of the generated synthetic workflows by comparing their simulated executions to real workflow executions. We also contrast these results with results obtained when using the previously available collection of tools. We find that the workflow generators that are automatically constructed by our framework not only generate representative same-scale workflows (i.e., with structures and task characteristics distributions that resemble those observed in real-world workflows), but also do so at scales larger than that of available real-world workflows. Finally, we conduct a case study to demonstrate the usefulness of our framework for estimating the energy consumption of large-scale workflow executions.
翻译:科学工作流程是现代科学计算的基石;科学工作流程是现代科学计算的基石;这些工作流程用于描述复杂的计算应用程序,这些应用程序需要高效和有力地管理大量数据,这些数据通常储存/处理在分散的资源上;工作流程研究与开发界采用了若干对现有和新工作流程算法和系统进行定量评价的方法;特别是,一个共同的方法是模拟工作流程处决;在以往的著作中,我们展示了社区为开展工作流程研究而采用的一系列工具;尽管这些工具受到欢迎,但它们存在一些缺陷,无法轻易地采用、维持和与生产工作流程不断变化的结构和计算要求保持一致;在这项工作中,我们介绍WfCommons这一框架,为分析工作流程的处决、合成工作流程的生成者以及模拟工作流程执行提供了一系列工具;我们通过将模拟处决与实际工作流程处决相比,展示了合成工作流程的现实现实现实主义;我们发现,我们框架自动建造的工作流程发电机不仅能产生具有代表性的工作流程(i. i. e.),而且能显示在实际工作流程中显示我们所观察到的大规模工作流程。