Recent deep learning models are difficult to train using a large batch size, because commodity machines may not have enough memory to accommodate both the model and a large data batch size. The batch size is one of the hyper-parameters used in the training model, and it is dependent on and is limited by the target machine memory capacity because the batch size can only fit into the remaining memory after the model is uploaded. Moreover, the data item size is also an important factor because if each data item size is larger then the batch size that can fit into the remaining memory becomes smaller. This paper proposes a framework called Micro-Batch Streaming (MBS) to address this problem. This method helps deep learning models to train by providing a batch streaming method that splits a batch into a size that can fit in the remaining memory and streams them sequentially. A loss normalization algorithm based on the gradient accumulation is used to maintain the performance. The purpose of our method is to allow deep learning models to train using larger batch sizes that exceed the memory capacity of a system without increasing the memory size or using multiple devices (GPUs).
翻译:最近深层次学习模型很难使用大批量尺寸来培训,因为商品机器可能没有足够的内存来容纳模型和大数据批量尺寸。批量尺寸是培训模型中使用的超参数之一,它取决于目标机器内存能力,并且受目标机内存能力的限制,因为批量大小只能在模型上传后才适合剩余内存。此外,数据项大小也是一个重要因素,因为如果每个数据项的大小较大,那么适合剩余内存的批量大小就会变小。本文提出一个称为微批量储存的框架来解决这个问题。这个方法有助于通过提供批量流法来培训深层学习模型,将批量分成分成分成成一个能够与剩余内存相容的大小,并按顺序流。基于渐变累积的失常正常化算法用于保持性能。我们方法的目的是允许深学习模型在不增加内存大小或使用多个设备的情况下,使用超过系统内存能力的大批量尺寸的批量规模的培训。