PGAS runtimes are well suited to irregular applications due to their support for short, one-sided messages. However, there are two major sources of overhead in PGAS runtimes that prevent them from achieving acceptable performance on large scale parallel systems. First, despite the availability of APIs that support non-blocking operations for important special cases, many PGAS operations on remote locations are synchronous by default, which can lead to long-latency stalls. Second, efficient inter-node communication requires careful aggregation and management of short messages. Experiments has shown that the performance of PGAS programs can be improved by more than 20$\times$ through the use of specialized lower-level libraries, such as Conveyors, but with a significant impact on programming productivity. The actor model has been gaining popularity in many modern programming languages such as Scala or Rust and also within the cloud computing community. In this paper, we introduce a new programming system for PGAS runtimes, in which all remote operations are asynchronous by default through the use of an actor-based programming system. In this approach, the programmer does not need to worry about complexities related to message aggregation and termination detection. Thus, our approach offers a desirable point in the productivity-performance spectrum, with scalable performance that approaches that of lower-level aggregation libraries but with higher productivity.
翻译:PGAS运行时间非常适合非常规应用,因为它们支持短片片面信息。然而,PGAS运行时间有两大管理费用来源,使得无法在大型平行系统中取得可接受的业绩。首先,尽管有支持重要特殊情况下非阻塞操作的APIS,但许多偏远地点的PGAS运行因默认而同步,可能导致长期拖延。第二,高效的节点通信需要仔细汇总和管理短片信息。实验表明,PGAS程序的运作可以通过使用专门的低层图书馆(如Conveyors)来改善20多美元的时间,但是对方案编制生产率产生重大影响。许多现代编程语言(如Scala或Rust)和云计算界内部的动作模式越来越受欢迎。在本文中,我们为PGAS运行时间引入了新的编程系统,所有远程业务都通过使用基于行为者的编程系统而违约,从而可以提高20多美元。在这一方法中,方案与高层次的图书馆(例如Conveylers)相比,其运作模式与高水平的运行方式并不令人担心。