Increasing the batch size of a deep learning model is a challenging task. Although it might help in utilizing full available system memory during training phase of a model, it results in significant loss of test accuracy most often. LARS solved this issue by introducing an adaptive learning rate for each layer of a deep learning model. However, there are doubts on how popular distributed machine learning systems such as SystemML or MLlib will perform with this optimizer. In this work, we apply LARS optimizer to a deep learning model implemented using SystemML.We perform experiments with various batch sizes and compare the performance of LARS optimizer with \textit{Stochastic Gradient Descent}. Our experimental results show that LARS optimizer performs significantly better than Stochastic Gradient Descent for large batch sizes even with the distributed machine learning framework, SystemML.
翻译:增加深层学习模式的批量规模是一项艰巨的任务。 虽然这可能有助于在模型培训阶段使用全部可用的系统记忆, 但它往往导致测试准确性的重大损失。 劳改系统为深层学习模式的每一层引入了适应性学习率, 从而解决这个问题。 但是, 人们怀疑流行的分布式机器学习系统, 如系统ML 或 MLlib, 将如何使用这个优化器来运行。 在这项工作中, 我们应用了使用系统ML 实施的深层学习模型优化软件。 我们用不同的批量大小进行实验, 并将LARS优化软件的性能与\textit{ 随机梯源子} 进行比较。 我们的实验结果表明, 即使在分布式机器学习框架系统MLSML 下, LARS 优化软件的批量尺寸也大大优于巨型结构。