We study the covariate shift problem in the context of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We focus on two natural families of covariate shift problems defined using the likelihood ratios between the source and target distributions. When the likelihood ratios are uniformly bounded, we prove that the kernel ridge regression (KRR) estimator with a carefully chosen regularization parameter is minimax rate-optimal (up to a log factor) for a large family of RKHSs with regular kernel eigenvalues. Interestingly, KRR does not require full knowledge of the likelihood ratio apart from an upper bound on it. In striking contrast to the standard statistical setting without covariate shift, we also demonstrate that a na\"\i ve estimator, which minimizes the empirical risk over the function class, is strictly suboptimal under covariate shift as compared to KRR. We then address the larger class of covariate shift problems where likelihood ratio is possibly unbounded yet has a finite second moment. Here, we show via careful simulations that KRR fails to attain the optimal rate. Instead, we propose a reweighted KRR estimator that weights samples based on a careful truncation of the likelihood ratios. Again, we are able to show that this estimator is minimax optimal, up to logarithmic factors.
翻译:我们从复制的内核 Hilbert 空间( RKHS ) 的非参数回归角度研究共变换变化问题。 我们关注两个自然的共变转移问题组, 使用源和目标分布之间的概率比来定义。 当概率比被统一约束时, 我们证明, 带有精心选择的正统参数的内核脊脊回归( KRR) 估测器, 与 KRR 相比, 严格来说, 内核峰值( 最高为一个日志系数 ) 是最小值最优化的。 有意思的是, KRR并不需要完全了解除上圈外的可能性比率。 在与标准统计设置的对比时, 我们还可以证明, NA\\\\\ i ve 估测器在最小值变差下, 在与 KRRR 相比的情况下, 严格来说, 最优化值是最小值变差值。 我们随后通过仔细的模拟, KRRRR会显示, 以最优值为最优的汇率显示我们所选择的数值。