Empirical risk minimization (ERM) is known in practice to be non-robust to distributional shift where the training and the test distributions are different. A suite of approaches, such as importance weighting, and variants of distributionally robust optimization (DRO), have been proposed to solve this problem. But a line of recent work has empirically shown that these approaches do not significantly improve over ERM in real applications with distribution shift. The goal of this work is to obtain a comprehensive theoretical understanding of this intriguing phenomenon. We first posit the class of Generalized Reweighting (GRW) algorithms, as a broad category of approaches that iteratively update model parameters based on iterative reweighting of the training samples. We show that when overparameterized models are trained under GRW, the resulting models are close to that obtained by ERM. We also show that adding small regularization which does not greatly affect the empirical training accuracy does not help. Together, our results show that a broad category of what we term GRW approaches are not able to achieve distributionally robust generalization. Our work thus has the following sobering takeaway: to make progress towards distributionally robust generalization, we either have to develop non-GRW approaches, or perhaps devise novel classification/regression loss functions that are adapted to the class of GRW approaches.
翻译:在实践中,经验风险最小化(ERM)在实践上是已知的,在培训和测试分布不同的情况下,并不局限于分布式转换。为了解决这一问题,已经提出了一套方法,例如重要性加权,以及分布性强优化的变体(DRO)等。但最近的一系列工作从经验上表明,这些方法在实际应用中并没有随着分布转移而在实际应用中显著改善。这项工作的目标是获得对这一令人感兴趣的现象的全面理论理解。我们首先将普遍加权算法(GRW)视为一个广泛的方法类别,根据对培训样本的迭接重加权,反复更新模型参数。我们表明,在GRW下培训过分模型时,由此产生的模型接近于机构风险管理所获得的模型。我们还表明,增加小规模的正规化不会极大地影响经验培训的准确性。我们的结果共同表明,我们所说的GRW方法的广泛类别无法实现分布式稳健的普及化。我们的工作因此具有以下的清醒的类别:使分配性模式在GRW模式的迭代重重重调整方法上取得进展,或许使GRW的升级方法升级到GR的升级。