Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bias - the tendency of gradient-based algorithms to learn simple models - which include the model's high sensitivity to small input perturbations, as well as sub-optimal margins. In particular, while Stochastic Gradient Descent yields max-margin boundary on linear models, such guarantee does not extend to non-linear models. To mitigate the simplicity bias, we consider uncertainty-driven perturbations (UDP) of the training data points, obtained iteratively by following the direction that maximizes the model's estimated uncertainty. Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary, allowing for using a larger range of values for the hyperparameter that controls the magnitude of the perturbation. Moreover, as real-world datasets have non-isotropic distances between data points of different classes, the above property is particularly appealing for increasing the margin of the decision boundary, which in turn improves the model's generalization. We show that UDP is guaranteed to achieve the maximum margin decision boundary on linear models and that it notably increases it on challenging simulated datasets. Interestingly, it also achieves competitive loss-based robustness and generalization trade-off on several datasets.
翻译:最近Shah等人(2020年)指出简单偏差的陷阱,即基于梯度的算法倾向于学习简单模型,其中包括模型对小输入扰动的高度敏感度,以及亚最佳边距。特别是,当Stochacastic Gradient Emple在线性模型上产生最大边界时,这种保证并不延伸到非线性模型。为了减轻简单偏差,我们认为培训数据点的不确定性驱动的扰动(UDP)是迭接性的,其取而代之,其取而代之的是尽量扩大模型估计不确定性的方向。与损失驱动的扰动不同,不确定性引导的扰动并不跨越决定边界,允许对控制扰动强度的超参数使用更大范围的数值。此外,由于真实世界的数据集在不同类别的数据点之间有非线性距离,上述属性对于增加决定边界的差值特别具有吸引力,这反过来改进了模型的概括性。我们表明,与损失驱动值不同的是,UDP保证它能够实现最大范围的比值决定。