Distributed machine learning is critical for training deep learning models on large datasets and with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate the training process. As a result, larger batch sizes are often employed to speed up training. However, training with large batch sizes can lead to lower accuracy due to poor generalization. To address this issue, we propose the dual batch size learning scheme, a distributed training method built on the parameter server framework. This approach maximizes training efficiency by utilizing the largest batch size that the hardware can support while incorporating a smaller batch size to enhance model generalization. By using two different batch sizes simultaneously, this method reduces testing loss and enhances generalization, with minimal extra training time. Additionally, to mitigate the time overhead caused by dual batch size learning, we propose the cyclic progressive learning scheme. This technique gradually adjusts image resolution from low to high during training, significantly boosting training speed. By combining cyclic progressive learning with dual batch size learning, our hybrid approach improves both model generalization and training efficiency. Experimental results using ResNet-18 show that, compared to conventional training methods, our method can improve accuracy by 3.3% while reducing training time by 10.6% on CIFAR-100, and improve accuracy by 0.1% while reducing training time by 35.7% on ImageNet.
翻译:分布式机器学习对于在大型数据集上训练具有海量参数的深度学习模型至关重要。当前研究主要集中于利用额外的硬件资源和强大的计算单元来加速训练过程。因此,通常采用更大的批次大小以加快训练速度。然而,使用大批次进行训练可能因泛化能力差而导致精度降低。为解决此问题,我们提出了双批次大小学习方案,这是一种构建在参数服务器框架上的分布式训练方法。该方法通过利用硬件所能支持的最大批次大小来最大化训练效率,同时引入一个较小的批次大小以增强模型泛化能力。通过同时使用两种不同的批次大小,该方法能以极少的额外训练时间降低测试损失并提升泛化性能。此外,为减轻双批次大小学习带来的时间开销,我们提出了循环渐进学习方案。该技术在训练过程中将图像分辨率从低到高逐步调整,从而显著提升训练速度。通过将循环渐进学习与双批次大小学习相结合,我们的混合方法同时提升了模型泛化能力和训练效率。使用ResNet-18的实验结果表明,与传统训练方法相比,我们的方法在CIFAR-100上能将精度提升3.3%同时减少10.6%的训练时间,在ImageNet上能将精度提升0.1%同时减少35.7%的训练时间。