用于分配外学习的示范Agnistic抽样加权 (Model Agnostic Sample Reweighting for Out-of-Distribution Learning)

Distributionally robust optimization (DRO) and invariant risk minimization (IRM) are two popular methods proposed to improve out-of-distribution (OOD) generalization performance of machine learning models. While effective for small models, it has been observed that these methods can be vulnerable to overfitting with large overparameterized models. This work proposes a principled method, \textbf{M}odel \textbf{A}gnostic sam\textbf{PL}e r\textbf{E}weighting (\textbf{MAPLE}), to effectively address OOD problem, especially in overparameterized scenarios. Our key idea is to find an effective reweighting of the training samples so that the standard empirical risk minimization training of a large model on the weighted training data leads to superior OOD generalization performance. The overfitting issue is addressed by considering a bilevel formulation to search for the sample reweighting, in which the generalization complexity depends on the search space of sample weights instead of the model size. We present theoretical analysis in linear case to prove the insensitivity of MAPLE to model size, and empirically verify its superiority in surpassing state-of-the-art methods by a large margin. Code is available at \url{https://github.com/x-zho14/MAPLE}.

翻译：为改进机器学习模型的分布外(OOOD)一般化性能而提出的两种流行方法是:优化分布性强优化(DRO)和风险最小化(IRM),这是为改进机器学习模型的分布外(OOOD)一般化绩效而提出的两种流行方法。虽然这些方法对小型模型有效,但已经观察到,这些方法很容易与大型超分化模型过于匹配。这项工作提出了一种原则性方法,\ textbf{M}M}odel\ textbf{A}nnoticic sam\ textb{PL}e r\ textbf{E}(\ textbf{MAPLE}),以有效解决OOOD(OD)问题,特别是在过分量化的情景中。我们的主要想法是找到一种有效的培训样品的重新加权培训样本的重新加权,以便使一个关于加权培训数据的大型模型的实验性培训最大限度地降低风险,从而提高OOOODDGPA/RMALE的通用度。我们用线性分析了线性案例,在模型大小上可以证明其高度的精度。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

63+阅读 · 2023年2月15日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日