A common explanation for the failure of deep networks to generalize out-of-distribution is that they fail to recover the "correct" features. We challenge this notion with a simple experiment which suggests that ERM already learns sufficient features and that the current bottleneck is not feature learning, but robust regression. Our findings also imply that given a small amount of data from the target distribution, retraining only the last linear layer will give excellent performance. We therefore argue that devising simpler methods for learning predictors on existing features is a promising direction for future research. Towards this end, we introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift. Rather than learning one function, DARE performs a domain-specific adjustment to unify the domains in a canonical latent space and learns to predict in this space. Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions. Further, we provide the first finite-environment convergence guarantee to the minimax risk, improving over existing analyses which only yield minimax predictors after an environment threshold. Evaluated on finetuned features, we find that DARE compares favorably to prior methods, consistently achieving equal or better performance.
翻译:深海网络无法推广分布外分布的常见解释是,它们未能恢复“ 正确” 特征。 我们用简单的实验来挑战这个概念, 这表明机构风险管理已经学到了足够的特征, 而目前的瓶颈并不是学习的特征, 而是有力的回归。 我们的发现还意味着,如果从目标分布中获得少量数据, 仅对最后一层线性层进行再培训, 就能带来极好的性能。 因此, 我们争辩说, 设计更简单的方法, 在现有特征上学习预测器, 是未来研究的一个有希望的方向。 为此, 我们引入了Domain- Adjusted Regrestition( DARE), 这是学习线性预测器的一个共通目标, 在新的分布变化模式下, 该线性预测器已经变得相当强大。 与其学习一个函数, DARE 进行一个特定的区域调整, 以统一在可能潜藏空间中的域, 并学会在这个空间进行预测。 在自然模型下, 我们证明 DARE 解决方案是限制的测试分布的最小- 最佳预测器。 此外, 我们为迷你- 环境趋一致的合并保证, 改进了我们之前的状态评估后, 改进了前的状态。