We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.
翻译:我们研究对神经网络(NNs)进行分布式培训的问题,即对具有多种、有限和有时间差异的计算资源的设备的设备进行培训的问题。我们展示了适应性、资源智能、在线学习机制(DISTREAL),它能够以分布式的方式充分和高效地利用设备上的现有资源,提高聚合速度。这是通过一个静流过滤器随机丢弃该模型卷积层的过滤器,对NNS培训的计算复杂性进行动态调整而实现的。我们的主要贡献是引入了设计空间探索技术(DSE),它发现Pareto-opatimal / per- liftal 矢量在资源要求和培训趋同速度方面找到了Pareto-optimal per- per- leveloptal 矢量器。运用这一技术,每个装置都能动态地选择适合其现有资源而不需要服务器任何帮助的辍学矢量。我们用一个节制学习系统来实施我们的解决办法,因为计算资源在装置之间和时间上各不相同。我们通过广泛的评估表明,我们能够大大提高艺术状态的趋同最后准确性。