We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment.
翻译:当源域因潜在分组分布的变化而与目标域不同时,我们处理未经监督的域适应问题。当该分组混淆所有观察到的数据时,既不采用共变转换,也不采用标签转换假设。我们显示,最佳目标预测器不能通过只有源域可用的概念和代理变量以及目标的未贴标签数据来进行非参数识别。识别结果具有建设性,立即建议一种算法,用于估算目标中的最佳预测器。对于持续观测,当这一算法变得不切实际时,我们建议一个具体针对手头数据生成过程的潜在变量模型。我们显示该方法如何降解为变换的大小,并核实它是否优于共变和标签变换的调整。