Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.
翻译:重量权重是处理分配变化的经典技术。然而,先前的工作提供了有力的经验和理论证据,表明重量权重对过度参数化的神经网络没有多大影响。重量权重是否真的与过度参数化神经网络的培训不相容?我们的论文对此做了否定回答。我们显示,重量权重不是因为过分参数化而失败,而是由于使用诸如物流或跨性器官损失等指数性尾量损失而导致的。但作为一种补救措施,我们显示,在纠正过度参数化的模型中,多元尾量损失恢复了重估分布变化的重要性。我们把梯度下降行为与加权多度成尾数的线性神经网络网络培训完全不相容吗?我们用过度参数化线性模型从理论上证明,在标签变换位时使用多元成尾量损失损失的优势。令人惊讶的是,我们的理论表明,使用偏偏重的不偏重的纯度重量值可以提高绩效。最后,我们展示了我们分析网络对亚人口结构变换或超度变换的重量值实验的实际价值,我们的数据也测试了比重性调整了我们的损失率性调整。