Shuffle is a key primitive in large-scale data processing applications. The difficulty of large-scale shuffle has inspired the development of many specialized shuffle systems. While these systems greatly improve shuffle performance and reliability, they come at a cost: flexibility. First, each shuffle system is essentially built from scratch, which is a significant developer effort. Second, the monolithic design of these shuffle systems makes them too rigid to support fine-grained pipelining, as desired by applications like distributed ML training. We argue that the inflexibility stems from the tight coupling of shuffle algorithms and system-level optimizations, and propose to use the distributed futures abstraction to decouple shuffle algorithms from the system. We present Exoshuffle, an application-level shuffle design built on top of Ray, a task-based distributed futures system. We show that it is possible to (1) express shuffle algorithms from previous shuffle systems in a few hundred lines of application-level Python code, (2) achieve competitive performance and scalability with specialized data systems like Spark, and (3) achieve interoperability with other data applications via fine-grained pipelining.
翻译:大规模洗牌是大规模数据处理应用程序中的关键原始。 大规模洗牌的困难刺激了许多专门洗牌系统的开发。 虽然这些系统大大提高了洗牌的性能和可靠性, 但这些系统的成本是: 灵活性。 首先, 每个洗牌系统基本上都是从零开始, 这是一项重要的开发工作。 其次, 这些洗牌系统的单一设计使其过于僵硬, 无法支持微软的排气管, 正如分布式 ML 培训等应用程序所希望的那样。 我们争辩说, 软性不灵活源于洗牌算法和系统一级优化的紧密结合, 并提议使用分布式未来抽象来从系统进行调色调的洗牌算法。 我们介绍 Exsoshifle, 一种基于任务分布式未来系统的应用程序级设计。 我们表明, 有可能 (1) 将先前的洗牌系统从上划出, 以几百行的应用级平板代码为单位, 实现微竞争性的性能和可调性, 与专门的数据系统, 如平流式, 实现其他数据互操作性。 (3)