We study the minimax rates of the label shift problem in non-parametric classification. In addition to the unsupervised setting in which the learner only has access to unlabeled examples from the target domain, we also consider the setting in which a small number of labeled examples from the target domain is available to the learner. Our study reveals a difference in the difficulty of the label shift problem in the two settings, and we attribute this difference to the availability of data from the target domain to estimate the class conditional distributions in the latter setting. We also show that a class proportion estimation approach is minimax rate-optimal in the unsupervised setting.
翻译:在非参数分类中,我们研究了标签转换问题的最小速率。除了学习者只能从目标域获得未贴标签的例子的未经监督的环境外,我们还考虑了学习者可以从目标域获得少量标注的例子的环境。我们的研究揭示了两种设置中标签转换问题的难度差异,我们将这一差异归结于目标域数据的可用性,以估计后一种设置中的等级有条件分布。我们还表明,在未受监督的设置中,等级比例估计方法是最小速率最佳的。