In the recent years, scientific workflows gained more and more popularity. In scientific workflows, tasks are typically treated as black boxes. Dealing with their complex interrelations to identify optimization potentials and bottlenecks is therefore inherently hard. The progress of a scientific workflow depends on several factors, including the available input data, the available computational power, and the I/O and network bandwidth. Here, we tackle the problem of predicting the workflow progress with very low overhead. To this end, we look at suitable formalizations for the key parameters and their interactions which are sufficiently flexible to describe the input data consumption, the computational effort and the output production of the workflow's tasks. At the same time they allow for computationally simple and fast performance predictions, including a bottleneck analysis over the workflow runtime. A piecewise-defined bottleneck function is derived from the discrete intersections of the task models' limiting functions. This allows to estimate potential performance gains from overcoming the bottlenecks and can be used as a basis for optimized resource allocation and workflow execution.
翻译:近年来,科学工作流程越来越受欢迎。 在科学工作流程中,任务通常被当作黑盒处理。 因此,处理复杂的相互关系以确定优化潜力和瓶颈本身就很困难。 科学工作流程的进展取决于若干因素, 包括现有的投入数据、 可用的计算能力、 I/O 和网络带宽。 这里, 我们用非常低的管理费用来应对预测工作流程进展的问题。 为此, 我们查看关键参数及其互动的适当正规化, 从而有足够的灵活性来描述输入数据的消耗、 计算努力和工作流程任务的输出。 同时, 它们允许进行简单和快速的性能预测, 包括工作流程运行时的瓶颈分析。 由任务模式限制功能的离散交叉作用衍生出一个拼数定义的瓶颈功能。 这样可以估计克服瓶颈的潜在绩效收益, 并可以用作优化资源分配和工作流程执行的基础 。