Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method.
翻译:宝贵的培训数据往往由独立组织拥有,并且位于多个数据中心。大多数深层次的学习方法要求将多数据中心数据集中用于业绩目的。然而,在实践中,由于带宽限制以及隐私条例的限制,将所有数据都转移到中央数据中心往往不可行。模型平均是数据平行培训的常规选择,但以往的研究声称其无效,因为深层神经网络往往是非软体的。在本文中,我们争论说,通过使用两种战略,即周期学习率和地方模型培训时代的增加,平均模式可以在分散环境中有效。我们通过这两种战略,表明平均模式可以提供分散模式的竞争性业绩,而与数据集中模式相比。在有多个数据中心的实用环境中,我们利用最先进的深层网络结构对不同类型的数据进行广泛的实验。结果显示拟议方法的有效性和稳健性。