Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed. However, to use gossip-based methods in general cases, the validation accuracy for a large mini-batch needs to be verified. To verify this, we first empirically study the characteristics of gossip methods in a large mini-batch problem and observe that the gossip methods preserve higher validation accuracy than AllReduce-SGD(Stochastic Gradient Descent) when the number of batch sizes is increased and the number of workers is fixed. However, the delayed parameter propagation of the gossip-based models decreases validation accuracy in large node scales. To cope with this problem, we propose Crossover-SGD that alleviates the delay propagation of weight parameters via segment-wise communication and load balancing random network topology. We also adapt hierarchical communication to limit the number of workers in gossip-based communication methods. To validate the effectiveness of our proposed method, we conduct empirical experiments and observe that our Crossover-SGD shows higher node scalability than SGP(Stochastic Gradient Push).
翻译:分散的深层次学习是减少大型数据集和复杂模型深层学习培训时间的有效方法。然而,由于网络管理管理导致的可扩缩性有限,因此难以使所有工人的参数同步。为了解决这个问题,已经提出了各种八卦为基础的方法,表明不论工人人数多少,均具有稳定的可扩缩性。然而,一般情况下,要使用八卦为基础的方法,大型微型批量的验证准确性需要核实。为了核实这一点,我们首先从经验上研究大型小型批量问题中八卦方法的特点,并观察到在批量规模增加和工人人数固定的情况下,八卦方法保持比全鲁氏SGD(随机梯级梯级梯级后发)更高的验证准确性。然而,基于八卦的模型的延迟参数传播降低了大节点范围内的校准准确性。为了解决这个问题,我们建议交叉SGD,通过分级通讯和负载随机网络表来减少重量参数的延迟传播。我们还调整等级通讯,以限制八卦的工人人数,而不是我们提议的S-GGL型交流方法。