In this short paper, we introduce the Ridgeline model, an extension of the Roofline model [4] for distributed systems. The Roofline model targets shared memory systems, bounding the performance of a kernel based on its operational intensity, and the peak compute throughput and memory bandwidth of the execution system. In a distributed setting, with multiple communicating compute entities, the network must be taken into account to model the system behavior accurately. The Ridgeline aggregates information on compute, memory, and network limits in one 2D plot to show, in an intuitive way, which of the resources is the expected bottleneck. We show the applicability of the Ridgeline in a case study based on a data-parallel Multi-Layer Perceptron (MLP) instance.
翻译:在此短文中,我们引入了Ridgeline模型,这是分布式系统的Roofline模型[4] 的延伸。Roofline模型的目标是共享的记忆系统,根据运行强度将内核的性能与执行系统的顶峰计算量和内存带宽加以约束。在分布式设置中,通过多个通信计算实体,网络必须被考虑在内,以精确地模拟系统行为。Ridgeline将一个2D图中的计算、内存和网络限值信息汇总在一起,以直观的方式显示资源中的哪些是预期的瓶颈。我们展示了Ridgeline在基于数据平行多Layer Percepron(MLP)实例的案例研究中的适用性。