Consider a prediction setting with few in-distribution labeled examples and many unlabeled examples both in- and out-of-distribution (OOD). The goal is to learn a model which performs well both in-distribution and OOD. In these settings, auxiliary information is often cheaply available for every input. How should we best leverage this auxiliary information for the prediction task? Empirically across three image and time-series datasets, and theoretically in a multi-task linear regression setting, we show that (i) using auxiliary information as input features improves in-distribution error but can hurt OOD error; but (ii) using auxiliary information as outputs of auxiliary pre-training tasks improves OOD error. To get the best of both worlds, we introduce In-N-Out, which first trains a model with auxiliary inputs and uses it to pseudolabel all the in-distribution inputs, then pre-trains a model on OOD auxiliary outputs and fine-tunes this model with the pseudolabels (self-training). We show both theoretically and empirically that In-N-Out outperforms auxiliary inputs or outputs alone on both in-distribution and OOD error.
翻译:考虑一个预测设置,在分布和分配之外(OOD)中,没有贴上标签的例子和许多未贴标签的例子。目标是学习一个在分布和OOOD两方面都表现良好的模型。在这些环境中,辅助信息往往对每种输入都廉价可得。我们应该如何最好地利用这一辅助信息进行预测任务?在三个图像和时间序列数据集中,理论上,在多任务线性回归设置中,我们显示:(一) 使用辅助信息作为输入特征,在分布错误方面改进了输入特征,但可能伤害OOD错误;(二) 使用辅助信息作为辅助培训前任务的产出,改进OOD错误。为了获得两个世界的最佳信息,我们引入In-N-Outt,首先用辅助投入来培训模型,然后用它假贴分配输入的所有输入,然后先在多任务线性回归设置中,我们用假标签(自我培训)对模型进行修改。我们从理论上和实验上都显示,N-O-O-O/OD误差的辅助投入或产出本身。