As neural networks are increasingly employed in machine learning practice, how to efficiently share limited training resources among a diverse set of model training tasks becomes a crucial issue. To achieve better utilization of the shared resources, we explore the idea of jointly training multiple neural network models on a single GPU in this paper. We realize this idea by proposing a primitive, called pack. We further present a comprehensive empirical study of pack and end-to-end experiments that suggest significant improvements for hyperparameter tuning. The results suggest: (1) packing two models can bring up to 40% performance improvement over unpacked setups for a single training step and the improvement increases when packing more models; (2) the benefit of the pack primitive largely depends on a number of factors including memory capacity, chip architecture, neural network structure, and batch size; (3) there exists a trade-off between packing and unpacking when training multiple neural network models on limited resources; (4) a pack-aware Hyperband is up to 2.7x faster than the original Hyperband, with this improvement growing as memory size increases and subsequently the density of models packed.
翻译:随着神经网络越来越多地用于机器学习实践,如何在一套不同的示范培训任务中有效分享有限的培训资源成为一个关键问题。为了更好地利用共享的资源,我们探讨了在本文件中就单一的GPU对多个神经网络模型进行联合培训的想法。我们通过提出原始的、称为包体来实现这一想法。我们进一步提出了包体和端到端实验的全面经验研究,表明超光谱调的显著改进。结果显示:(1) 包体两种模型可以比无包装的单一培训步骤的设置提高40%的性能改进,如果包装更多模型,则会增加改进;(2) 包体原始的效益主要取决于若干因素,包括记忆能力、芯片结构、神经网络结构和批量尺寸;(3) 当培训关于有限资源的多神经网络模型时,包装和拆包体之间存在着一种权衡;(4) 包体超音频波比原超音波速达2.7x,随着记忆规模的增加和随后包装模型的密度的增加,这种改进会增加。