Recently many first and second order variants of SGD have been proposed to facilitate training of Deep Neural Networks (DNNs). A common limitation of these works stem from the fact that they use the same learning rate across all instances present in the dataset. This setting is widely adopted under the assumption that loss functions for each instance are similar in nature, and hence, a common learning rate can be used. In this work, we relax this assumption and propose an optimization framework which accounts for difference in loss function characteristics across instances. More specifically, our optimizer learns a dynamic learning rate for each instance present in the dataset. Learning a dynamic learning rate for each instance allows our optimization framework to focus on different modes of training data during optimization. When applied to an image classification task, across different CNN architectures, learning dynamic learning rates leads to consistent gains over standard optimizers. When applied to a dataset containing corrupt instances, our framework reduces the learning rates on noisy instances, and improves over the state-of-the-art. Finally, we show that our optimization framework can be used for personalization of a machine learning model towards a known targeted data distribution.
翻译:最近,为了便利深神经网络(DNN)的培训,提出了SGD最近的许多第一和第二顺序变体。这些作品的一个共同限制是,它们使用的学习率在数据集中的所有实例中都是相同的。这种设置被广泛采用,假设每个实例的损失函数性质相似,因此可以使用共同的学习率。在这项工作中,我们放松这一假设,提出一个优化框架,以计算不同实例的损失函数特征的差异。更具体地说,我们的优化者学习数据集中每个实例的动态学习率。学习每个实例的动态学习率使得我们的优化框架能够在优化期间侧重于不同的培训数据模式。当应用到图像分类任务时,在不同的CNN结构中,学习动态学习率导致标准优化器的一致收益。在应用包含腐败实例的数据集时,我们的框架会降低噪音实例的学习率,并改进最新技术。最后,我们表明,我们的优化框架可以用于机器学习模型的个人化,从而实现已知的目标数据分配。