We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the semantics of transformations that exploit data parallelism available in Unix shell computations and prove their correctness. We additionally formalize the translations from the Unix shell to the dataflow model and from the dataflow model back to a parallel shell script. We implement our model and transformations as the compiler and optimization passes of a system parallelizing shell pipelines, and use it to evaluate the speedup achieved on 47 pipelines.
翻译:我们为模拟平行Unix 贝壳管道提供了一个数据流模型。 为了准确地捕捉复杂的Unix管道的语义,数据流模型是有秩序的,即数据流图中的节点消耗来自不同边缘的投入的顺序在计算语义和由此而来的平行化过程中发挥着核心作用。我们使用这个模型来捕捉利用Unix 贝壳计算中的数据平行的变异的语义,并证明这些变异的正确性。我们进一步将Unix 贝壳转换为数据流模型,并将数据流模型转换为平行的外壳脚本。我们把模型和变换作为平行贝壳管道系统的编译和优化通道,并用来评价47个管道的加速率。