The quantification problem consists of determining the prevalence of a given label in a target population. However, one often has access to the labels in a sample from the training population but not in the target population. A common assumption in this situation is that of prior probability shift, that is, once the labels are known, the distribution of the features is the same in the training and target populations. In this paper, we derive a new lower bound for the risk of the quantification problem under the prior shift assumption. Complementing this lower bound, we present a new approximately minimax class of estimators, ratio estimators, which generalize several previous proposals in the literature. Using a weaker version of the prior shift assumption, which can be tested, we show that ratio estimators can be used to build confidence intervals for the quantification problem. We also extend the ratio estimator so that it can: (i) incorporate labels from the target population, when they are available and (ii) estimate how the prevalence of positive labels varies according to a function of certain covariates.
翻译:量化问题包括确定特定标签在目标人群中的流行程度。然而,人们往往能够从培训人群的抽样中找到标签,而不是目标人群中的标签。在这种情况下,一个共同的假设是先前的概率变化,即一旦标签被了解,特征的分布在培训和目标人群中是一样的。在本文中,我们从先前的转移假设中为量化问题的风险得出一个新的较低界限。为了补充这一较低界限,我们提出了一个新的大致微缩的估量者、比率估计者等类别,这些类别概括了文献中以前的一些提议。我们使用先前的变位假设的较弱版本(可以测试),我们展示了比率估计者可以用来为量化问题建立信任间隔。我们还扩展了比率估量器,以便它能够:(一) 纳入目标人群的标签,当有这些标签时,以及(二) 估计正值标签的流行程度如何根据某些变量的功能而变化。