Conventional approaches to robustness try to learn a model based on causal features. However, identifying maximally robust or causal features may be difficult in some scenarios, and in others, non-causal "shortcut" features may actually be more predictive. We propose a lightweight, sample-efficient approach that learns a diverse set of features and adapts to a target distribution by interpolating these features with a small target dataset. Our approach, Project and Probe (Pro$^2$), first learns a linear projection that maps a pre-trained embedding onto orthogonal directions while being predictive of labels in the source dataset. The goal of this step is to learn a variety of predictive features, so that at least some of them remain useful after distribution shift. Pro$^2$ then learns a linear classifier on top of these projected features using a small target dataset. We theoretically show that Pro$^2$ learns a projection matrix that is optimal for classification in an information-theoretic sense, resulting in better generalization due to a favorable bias-variance tradeoff. Our experiments on four datasets, with multiple distribution shift settings for each, show that Pro$^2$ improves performance by 5-15% when given limited target data compared to prior methods such as standard linear probing.
翻译:稳健的常规方法试图学习基于因果特征的模型。 但是, 在某些情景中, 确定最强的或因果特征可能很难找到, 在另一些情景中, 非因果“ shortcut” 特征实际上可能更具有预测性。 我们建议了一种轻量的、 抽样有效的方法, 学习一系列不同的特征, 并通过将这些特征与一个小目标数据集进行内插来适应目标分布。 我们的方法, Pro2$( Pro2$) 和Probe( Pro_ 2$), 首先学习一个线性预测, 该预测将预先训练过的嵌入正方方向, 并同时预测源数据集中的标签。 这个步骤的目标是学习各种预测性特征, 因此至少其中一些在分布变化后仍然有用。 Pro$2$, 然后用一个小目标数据集在这些预测特征的顶端学习一个线性分类。 我们理论上显示, Pro_2$( pro_ 2$) 学习一个最适合信息- 理论意义上分类的预测性矩阵, 导致更好的概括化, 导致有利于偏向偏差交易的偏差值交易。 我们关于4个数据的实验, ro15 之前的路径, 将测试, 将数据转换为5- preallinebregregregregregregregregation, vilation view das lap d dal