We present KumQuat, a system for automatically generating data parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners. We evaluate KumQuat on 70 benchmark scripts that together have a total of 427 stages. KumQuat synthesizes a correct combiner for 113 of the 121 unique commands that appear in these benchmark scripts. The synthesis times vary between 39 seconds and 331 seconds with a median of 60 seconds. We present experimental results that show that these combiners enable the effective parallelization of our benchmark scripts.
翻译:我们展示了 KumQuat, 这是一个自动生成Unix shell 命令和管道数据平行执行数据的系统。 生成的平行版本分解输入流, 执行原始管道命令的多次即时同步以平行处理拆分, 然后将由此产生的平行输出合并来生成最终输出流。 KumQuat 自动合成组合操作员, 并使用特定域的组合语言作为强大的调节器, 从而推动正确组合器的有效推断。 我们根据70个基准脚本对 KumQuat 进行了评估, 这些脚本共有427 个阶段。 KumQuat 合成了这些基准脚本中显示的121个独有命令中的113个的正确组合器。 合成时间在39 秒到 331秒之间, 中位为60秒。 我们提出实验结果, 显示这些组合器能够有效地平行使用我们的基准脚本。