DISCO: 散散式推论与散散散通信 (DISCO: Distributed Inference with Sparse Communications)

Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN models. In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes. In this way, the memory requirement can be distributed to many nodes, making it possible to use several edge devices to infer a large DNN model. Due to the dependency within each layer, data communications between nodes during this parallel inference can be a bottleneck when the communication bandwidth is limited. We propose a framework to train DNN models for Distributed Inference with Sparse Communications (DISCO). We convert the problem of selecting which subset of data to transmit between nodes into a model optimization problem, and derive models with both computation and communication reduction when each layer is inferred on multiple nodes. We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution. The corresponding models include important DNN building blocks such as convolutions and transformers. For example, each layer of a ResNet-50 model can be distributively inferred across two nodes with five times less data communications, almost half overall computations and half memory requirement for a single node, and achieve comparable accuracy to the original ResNet-50 model. This also results in 4.7 times overall inference speedup.

翻译：深心内网络( DNNs) 具有巨大的潜力来解决许多真实世界问题, 但通常需要大量的计算和记忆。将大型 DNN 模型部署到一个内存容量小但资源有限的单一设备上非常困难。分散计算是减少单点内存消耗和加速 DNN 模型推导的常见方法。在本文中, 我们探索“ 内层模型平行”, 将每个层的推断分解成多个节点。这样, 内存需求可以分解到多个节点, 从而有可能使用几个边缘设备来推导一个大的 DNNN 模型。由于每个层内存在依赖性, 平行推断期间的节点之间的数据通信在通信宽度有限时可能是一个瓶颈。我们提出一个框架来训练 DNNN 模式, 以斯帕瑟通信( DISCO) 来将数据组分解成的集到一个模式, 可以将节点之间的数据分解变成一个模型, 将每节内存储和通信分解到两个节点的计算和通信分解的模型, 当每层在多点内, 度内分解一个不易变变变变的图中, 。我们用一个CSDIS 段内, 的计算到一个重要的 CDIS DIS 的分解的分解的分算一个重要的C 。