Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to model-parallelism that removes the standard end-to-end learning setup and utilizes local objective functions to permit parallel learning amongst model components in a deep network. Recent works have demonstrated that variants of local learning can lead to efficient training of modern deep networks. However, in terms of how much computation can be distributed, these approaches are typically limited by the number of layers in a network. In this work we propose to study how local learning can be applied at the level of splitting layers or modules into sub-components, adding a notion of width-wise modularity to the existing depth-wise modularity associated with local learning. We investigate local-learning penalties that permit such models to be trained efficiently. Our experiments on the CIFAR-10, CIFAR-100, and Imagenet32 datasets demonstrate that introducing width-level modularity can lead to computational advantages over existing methods based on local learning and opens new opportunities for improved model-parallel distributed training. Code is available at: https://github.com/adeetyapatel12/GN-DGL.
翻译:传统的深层次网络培训方法为所有组成部分联合优化一个单一的单一目标功能。这可能导致潜在平行化方面的各种低效率。当地学习是一种模式平行主义方法,它消除了标准的端到端学习设置,并利用地方客观功能使深层次网络中模型组成部分之间的平行学习得以在深层次网络中平行进行。最近的工作表明,当地学习的变式可以导致对现代深层次网络的有效培训。然而,在计算有多少可以分配方面,这些方法通常受到网络中层数的限制。在这项工作中,我们提议研究如何将地方学习应用于分层或模块的次级组成部分一级,在与当地学习相关的现有深层次模块化中增加宽度模块化概念。我们调查允许这种模型得到有效培训的地方学习处罚。我们在CIFAR-10、CIFAR-100和图像网32数据集的实验表明,采用宽度模块化方法可以导致对现有方法的计算优势。我们建议研究如何在当地学习的基础上将本地学习和新机会应用于改进模型-段落/GNBADG。代码可在 http://GLgi/DGDGS。