混合双批次与循环渐进学习：高效分布式训练方法 (Hybrid Dual-Batch and Cyclic Progressive Learning for Efficient Distributed Training)

Distributed machine learning is critical for training deep learning models on large datasets with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate the training process. As a result, larger batch sizes are often employed to speed up training. However, training with large batch sizes can lead to lower accuracy due to poor generalization. To address this issue, we propose the dual-batch learning scheme, a distributed training method built on the parameter server framework. This approach maximizes training efficiency by utilizing the largest batch size that the hardware can support while incorporating a smaller batch size to enhance model generalization. By using two different batch sizes simultaneously, this method improves accuracy with minimal additional training time. Additionally, to mitigate the time overhead caused by dual-batch learning, we propose the cyclic progressive learning scheme. This technique repeatedly and gradually increases image resolution from low to high during training, thereby reducing training time. By combining cyclic progressive learning with dual-batch learning, our hybrid approach improves both model generalization and training efficiency. Experimental results with ResNet-18 demonstrate that, compared to conventional training methods, our approach improves accuracy by 3.3% while reducing training time by 10.1% on CIFAR-100, and further achieves a 34.8% reduction in training time on ImageNet.

翻译：分布式机器学习对于在大型数据集上训练具有海量参数的深度学习模型至关重要。当前研究主要侧重于利用额外的硬件资源和强大的计算单元来加速训练过程，因此常采用更大的批次大小以提升训练速度。然而，使用大批次进行训练可能因泛化能力不足而导致准确率下降。为解决这一问题，我们提出了双批次学习方案，这是一种基于参数服务器框架的分布式训练方法。该方法通过利用硬件支持的最大批次大小来最大化训练效率，同时引入较小的批次大小以增强模型泛化能力。通过同时使用两种不同的批次大小，该方法以最小的额外训练时间成本提高了准确率。此外，为缓解双批次学习带来的时间开销，我们提出了循环渐进学习方案。该技术在训练过程中反复且逐步地将图像分辨率从低到高提升，从而减少训练时间。通过将循环渐进学习与双批次学习相结合，我们的混合方法同时提升了模型泛化能力和训练效率。在ResNet-18上的实验结果表明，与传统训练方法相比，我们的方法在CIFAR-100数据集上准确率提高了3.3%，训练时间减少了10.1%；在ImageNet数据集上进一步实现了34.8%的训练时间缩减。