Distributionally robust optimization (DRO) and invariant risk minimization (IRM) are two popular methods proposed to improve out-of-distribution (OOD) generalization performance of machine learning models. While effective for small models, it has been observed that these methods can be vulnerable to overfitting with large overparameterized models. This work proposes a principled method, \textbf{M}odel \textbf{A}gnostic sam\textbf{PL}e r\textbf{E}weighting (\textbf{MAPLE}), to effectively address OOD problem, especially in overparameterized scenarios. Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model on the weighted training data leads to superior OOD generalization performance. The overfitting issue is addressed by considering a bilevel formulation to search for the sample reweighting, in which the generalization complexity depends on the search space of sample weights instead of the model size. We present theoretical analysis in linear case to prove the insensitivity of MAPLE to model size, and empirically verify its superiority in surpassing state-of-the-art methods by a large margin. Code is available at \url{https://github.com/x-zho14/MAPLE}.
翻译:为改进机器学习模型的分布外(OOOD)一般化性能而提出的两种流行方法是:优化分布性强优化(DRO)和风险最小化(IRM),这是为改进机器学习模型的分布外(OOOD)一般化绩效而提出的两种流行方法。虽然这些方法对小型模型有效,但已经观察到,这些方法很容易与大型超分化模型过于匹配。这项工作提出了一种原则性方法,\ textbf{M}M}odel\ textbf{A}nnoticic sam\ textb{PL}e r\ textbf{E}(\ textbf{MAPLE}),以有效解决OOOD(OD)问题,特别是在过分量化的情景中。我们的主要想法是找到一种有效的培训样品的重新加权培训样本的重新加权,以便使一个关于加权培训数据的大型模型的实验性培训最大限度地降低风险,从而提高OOOODDGPA/RMALE的通用度。我们用线性分析了线性案例,在模型大小上可以证明其高度的精度。