Dataflow applications, such as machine learning algorithms, can run for days, making it desirable to have assurances that they will work correctly. Current tools are not good enough: too often the interactions between tasks are not type-safe, leading to undesirable run-time errors. This paper presents a new declarative Haskell Embedded DSL (eDSL) for dataflow programming: CircuitFlow. Defined as a Symmetric Monoidal Preorder (SMP) on data that models dependencies in the workflow, it has a strong mathematical basis, refocusing on how data flows through an application, resulting in a more expressive solution that not only catches errors statically, but also achieves competitive run-time performance. In our preliminary evaluation, CircuitFlow outperforms the industry-leading Luigi library of Spotify by scaling better with the number of inputs. The innovative creation of CircuitFlow is also of note, exemplifying how to create a modular eDSL whose semantics necessitates effects, and where storing complex type information for program correctness is paramount.
翻译:数据流应用程序,例如机器学习算法,可以运行数日, 使数据流应用程序得到正确运行的保证是可取的。 目前的工具不够好: 任务之间的相互作用往往不安全, 导致不可取的运行时间错误。 本文为数据流编程介绍了一个新的宣示性 Haskell 嵌入 DSL( eDSL) : CirectFlow。 被定义为一个Symitical Modid Preseral( SMP), 其数据是模拟工作流程中依赖的, 它有很强的数学基础, 重新关注数据通过应用程序流动的方式, 导致一个更清晰的解决方案, 不仅静态地捕捉错误, 而且还实现竞争性运行时间性性运行性能。 在我们的初步评估中, CirectFlow 将光谱化行业领先的Lugiigi图书馆( emprify), 其创新的创建过程也值得注意, 举例说明如何创建一个模块化的eDSL, 它的语义性必然产生效果, 并且存储程序正确性复杂类型信息至关重要 。