In many real world problems, the training data and test data have different distributions. This situation is commonly referred as a dataset shift. The most common settings for dataset shift often considered in the literature are {\em covariate shift } and {\em target shift}. Importance weighting (IW) correction is a universal method for correcting the bias present in learning scenarios under dataset shift. The question one may ask is: does IW correction work equally well for different dataset shift scenarios? By investigating the generalization properties of the weighted kernel ridge regression (W-KRR) under covariate and target shifts we show that the answer is negative, except when IW is bounded and the model is wellspecified. In the latter cases, a minimax optimal rates are achieved by importance weighted kernel ridge regression (IW-KRR) in both, covariate and target shift scenarios. Slightly relaxing the boundedness condition of the IW we show that the IW-KRR still achieves the optimal rates under target shift while leading to slower rates for covariate shift. In the case of the model misspecification we show that the performance of the W-KRR under covariate shift could be substantially increased by designing an alternative reweighting function. The distinction between misspecified and wellspecified scenarios does not seem to be crucial in the learning problems under target shift.
翻译:在许多真实的世界问题中, 培训数据和测试数据有着不同的分布。 这种情况通常被称为数据集变化。 文献中经常考虑的数据集变化最常见的设置是 ~ 共变换 } 和 ~ 目标变换 } 。 重要性加权( IW) 校正是纠正在数据集变换中学习假想中存在的偏差的普遍方法。 问题可能是: IW 校正是否对不同的数据集变换情景同样有效? 通过调查在正变换和目标变换中加权内核脊回归(W- KRR)的一般特性, 我们发现答案是否定的, 除非 IW 被捆绑住, 并且模型被很好地指定。 在后一种情况中, 重量最小化( IW- KRR) 校正( IW- KRR) 校正是一个普遍的方法, 用于纠正在数据集变换中存在的偏差( IW- KRR) 。 我们稍稍放松IW- KRR 的束缚性条件, 我们显示在目标变换时仍然达到目标下的最佳比率, 而不会导致更慢的变换速度。 在模型变换中, 方向下, 我们似乎要显示 方向的变变换 。