Network library APIs have historically been developed with the emphasis on data movement, placement, and communication semantics. Many communication semantics are available across a large variety of network libraries, such as send-receive, data streaming, put/get/atomic, RPC, active messages, collective communication, etc. In this work we introduce new compute and data movement APIs that overcome the constraints of the single-program, multiple-data (SPMD) programming model by allowing users to send binary executable code between processing elements. Our proof-of-concept implementation of the API is based on the UCX communication framework and leverages the RDMA network for fast compute migration. We envision the API being used to dispatch user functions from a host CPU to a SmartNIC (DPU), computational storage drive (CSD), or remote servers. In addition, the API can be used by large-scale irregular applications (such as semantic graph analysis), composed of many coordinating tasks operating on a data set so big that it has to be stored on many physical devices. In such cases, it may be more efficient to dynamically choose where code runs as the applications progresses.
翻译:网络图书馆 API 历来都是以数据移动、定位和通信语义为重点开发的。许多通信语义存在于各种网络图书馆中,例如发送接收、数据流、投放/配置/解剖、RPC、主动信息、集体通信等。在这项工作中,我们引入新的计算和数据移动 API 模式,以克服单程序、多数据(SPMD)编程模式的制约,允许用户发送可处理元素之间的二进制执行代码。我们的API 概念验证执行基于UCX 通信框架,利用RDMA 网络快速进行编译。我们设想API 用于将用户功能从主机 CPU 发送到智能计算机、计算存储驱动器或远程服务器。此外,API 也可以被大规模非常规应用程序(如语义图分析) 使用, 由许多协调任务组成, 在一个庞大的数据集上运行, 以至于它必须存储在许多物理设备上。在这种情况下,, 它可能更高效地选择动态运行的代码。