Though data augmentation has rapidly emerged as a key tool for optimization in modern machine learning, a clear picture of how augmentation schedules affect optimization and interact with optimization hyperparameters such as learning rate is nascent. In the spirit of classical convex optimization and recent work on implicit bias, the present work analyzes the effect of augmentation on optimization in the simple convex setting of linear regression with MSE loss. We find joint schedules for learning rate and data augmentation scheme under which augmented gradient descent provably converges and characterize the resulting minimum. Our results apply to arbitrary augmentation schemes, revealing complex interactions between learning rates and augmentations even in the convex setting. Our approach interprets augmented (S)GD as a stochastic optimization method for a time-varying sequence of proxy losses. This gives a unified way to analyze learning rate, batch size, and augmentations ranging from additive noise to random projections. From this perspective, our results, which also give rates of convergence, can be viewed as Monro-Robbins type conditions for augmented (S)GD.
翻译:尽管数据增强已迅速成为现代机器学习优化的关键工具,但关于增强计划如何影响优化和与优化超参数互动(例如学习率)的清晰图解还是刚刚开始的。根据古典Convex优化精神和最近关于隐含偏差的工作,目前的工作分析了在单向回归的简单Convex设置中,随着MSE损失,增强对优化的影响。我们发现学习率和数据增强计划的联合时间表,根据这些时间表,增加梯度的下降可明显地趋同,并给由此产生的最低值定性。我们的结果适用于任意的增强计划,暴露了学习率和增益之间的复杂互动,甚至在 convex 设置中也是如此。我们的方法将(S)GD 解释为一个在时间变化的代理损失序列中增加的随机优化方法。这为分析学习率、批量大小和从添加噪音到随机预测的增益提供了一个统一的方法。从这个角度看,我们的结果(也给出了趋同率)可以视为增强(S)GD的Monro-Robbins类型条件。