Machine learning algorithms with empirical risk minimization are vulnerable under distributional shifts due to the greedy adoption of all the correlations found in training data. There is an emerging literature on tackling this problem by minimizing the worst-case risk over an uncertainty set. However, existing methods mostly construct ambiguity sets by treating all variables equally regardless of the stability of their correlations with the target, resulting in the overwhelmingly-large uncertainty set and low confidence of the learner. In this paper, we propose a novel Stable Adversarial Learning (SAL) algorithm that leverages heterogeneous data sources to construct a more practical uncertainty set and conduct differentiated robustness optimization, where covariates are differentiated according to the stability of their correlations with the target. We theoretically show that our method is tractable for stochastic gradient-based optimization and provide the performance guarantees for our method. Empirical studies on both simulation and real datasets validate the effectiveness of our method in terms of uniformly good performance across unknown distributional shifts.
翻译:由于在培训数据中发现的所有关联关系都是贪婪的,因此在分配变化中,以实验风险最小化的机器学习算法是脆弱的。关于解决这一问题的文献正在涌现,在一组不确定因素中将最坏的风险降到最低。然而,现有方法大多通过一视同仁地对待所有变量而形成模棱两可的模棱两可,而不论这些变量与目标的相关性是否稳定,从而导致绝大多数的不确定性和学习者信心低。在本文中,我们提议了一个新的“稳定对立学习(SAL)算法”,利用各种数据源来构建一套更实用的不确定性组,并进行差异强健优化,根据与目标的相关性的稳定性对共变数加以区分。我们理论上表明,我们的方法可以用于分辨梯度优化,并为我们的方法提供性能保障。关于模拟和真实数据集的实验研究证实了我们方法在各种未知分布变化的统一良好性能方面的有效性。