Distributed machine learning (ML) can bring more computational resources to bear than single-machine learning, reducing training time. Further, distribution allows models to be partitioned over many machines, allowing very large models to be trained -- models that may be much larger than the available memory of any individual machine. However, in practice, distributed ML remains challenging, primarily due to high communication costs. We propose a new approach to distributed neural network learning, called independent subnet training (IST). In IST, a neural network is decomposed into a set of subnetworks of the same depth as the original network, each of which is trained locally, before the various subnets are exchanged and the process is repeated. IST training has many advantages over standard data parallel approaches. Because the subsets are independent, communication frequency is reduced. Because the original network is decomposed into independent parts, communication volume is reduced. Further, the decomposition makes IST naturally model parallel, and so IST scales to very large models that cannot fit on any single machine. We show experimentally that IST results in training time that are much lower than data parallel approaches to distributed learning, and that it scales to large models that cannot be learned using standard approaches.
翻译:分布式机器学习( ML) 能够带来比单机学习更多的计算资源, 减少培训时间。 此外, 分布式可以让模型在很多机器上分割, 使非常大的模型得到训练 -- -- 这些模型可能比任何机器的记忆力要大得多。 但是, 在实践中, 分布式ML仍然具有挑战性, 主要原因是通信成本高。 我们提出了分布式神经网络学习的新方法, 称为独立的子网培训 。 在 IST 中, 神经网络被分解成一系列与原始网络深度相同的子网络, 每个网络都在当地培训, 并在交换各种子网和进程重复之前。 IST 培训在标准数据平行方法上有许多优势。 由于子集是独立的, 通信频率减少。 由于原始网络分解成独立的部分, 通信量减少。 此外, 分解式使 IST 自然的模型平行, 所以 IST 比例是无法适应任何单一机器的非常大的模型。 我们实验性地显示, IST 结果在培训时间比数据平行的方法要低得多, 无法用标准模型学习。