Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. Recently, there are robust learning methods aiming at this problem by minimizing the worst-case risk over an uncertainty set. However, they equally treat all covariates to form the decision sets regardless of the stability of their correlations with the target, resulting in the overwhelmingly large set and low confidence of the learner.In this paper, we propose Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradient-based optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.
翻译:由于在培训数据中发现的所有关联关系均被贪婪地采用,在分布式转变中,以实验风险最小化为实验性风险最小化的机器学习算法处于脆弱状态。最近,有一些强有力的学习方法,旨在通过在不确定因素中尽量减少最坏的风险来解决这一问题。然而,它们同样对待所有同级变量,以形成决策组,而不论其与目标的相关性是否稳定,从而导致绝大多数的数据集和学习者信心低。 在本文件中,我们提议采用稳定反向学习算法,利用差异性数据源来构建一个更实用的不确定性组,并进行差异性强力优化,使同级变量根据与目标的相关性的稳定性而有所区别。我们理论上表明,我们的方法可以用于分化梯度优化,并为我们的方法提供性能保障。关于模拟和真实数据集的实验性研究证实了我们方法的效力,即在未知的分布式转移中实现统一良好性能。