Programming stateful cloud applications remains a very painful experience. Instead of focusing on the business logic, programmers spend most of their time dealing with distributed systems considerations, with the most important being consistency, load balancing, failure management, recovery, and scalability. At the same time, we witness an unprecedented adoption of modern dataflow systems such as Apache Flink, Google Dataflow, and Timely Dataflow. These systems are now performant and fault-tolerant, and they offer excellent state management primitives. With this line of work, we aim at investigating the opportunities and limits of compiling general-purpose programs into stateful dataflows. Given a set of easy-to-follow code conventions, programmers can author stateful entities, a programming abstraction embedded in Python. We present a compiler pipeline named StateFlow, to analyze the abstract syntax tree of a Python application and rewrite it into an intermediate representation based on stateful dataflow graphs. StateFlow compiles that intermediate representation to a target execution system: Apache Flink and Beam, AWS Lambda, Flink's Statefun, and Cloudburst. Through an experimental evaluation, we demonstrate that the code generated by StateFlow incurs minimal overhead. While developing and deploying our prototype, we came to observe important limitations of current dataflow systems in executing cloud applications at scale.
翻译:编程状态的云层应用仍是一个非常痛苦的经历。 编程者不注重商业逻辑,而是花大部分时间处理分布式系统考虑,最重要的是一致性、负负平衡、故障管理、恢复和可缩放性。 与此同时,我们见证了前所未有的采用现代数据流系统,如阿帕奇·弗林克、谷歌数据流和及时数据流。 这些系统现在表现良好且不易出错,它们提供了极好的州管理原始数据。 通过这项工作,我们旨在调查将普通用途程序汇编成州性数据流的机会和限度。鉴于一套易于执行的代码公约,编程者可以编写出符合国情的实体,一个嵌入平通的抽象程序。我们展示了一个名为“ 国家” 的编程编程,以分析Python 应用程序的抽象的加税树,并将其改写成一个基于州性数据流图表的中间代表。 州Flow将这种中间代表编程汇编成一个目标执行系统:阿帕奇·弗林克和Bamm、AWS Lambda、Flink's、F 和Cloxburburbstst 应用系统。 通过一个实验性小的系统,我们正在一个重要地展示了一个实验化数据流,我们的重要数据流。 通过一个实验化数据流系统,我们进行了一个实验性数据流,我们进行了一个实验化的系统。