This paper considers binary and multilabel classification problems in a setting where labels are missing independently and with a known rate. Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks, such as matching Wikipedia articles to a small subset out of the hundreds of thousands of possible tags, where no human annotator can possibly check the validity of all the negative samples. For this reason, propensity-scored precision -- an unbiased estimate for precision-at-k under a known noise model -- has become one of the standard metrics in XMC. Few methods take this problem into account already during the training phase, and all are limited to loss functions that can be decomposed into a sum of contributions from each individual label. A typical approach to training is to reduce the multilabel problem into a series of binary or multiclass problems, and it has been shown that if the surrogate task should be consistent for optimizing recall, the resulting loss function is not decomposable over labels. Therefore, this paper derives the unique unbiased estimators for the different multilabel reductions, including the non-decomposable ones. These estimators suffer from increased variance and may lead to ill-posed optimization problems, which we address by switching to convex upper-bounds. The theoretical considerations are further supplemented by an experimental study showing that the switch to unbiased estimators significantly alters the bias-variance trade-off and may thus require stronger regularization, which in some cases can negate the benefits of unbiased estimation.
翻译:本文在标签独立缺失且已知比率为人所知的环境下考虑二进制和多标签分类问题。 缺失标签在极端多标签分类( XMC)任务中是一种普遍存在的现象, 例如将维基百科文章匹配成数十万个可能的标签中的一小部分, 没有人可以检查所有负面样本的有效性。 为此原因, 偏差分数精确度 -- -- 在已知噪音模式下对精确到k值的不偏倚估计 -- -- 已经成为XMC的标准标准标准度标值之一。 在培训阶段,很少有方法考虑到这一问题, 并且所有方法都局限于损失功能, 而这些功能可以由每个标签的一组贡献总和。 典型的培训方法是将多标签问题降为一系列二进制或多级问题, 而没有人注解, 由此得出的损失函数不会比标签更强。 因此, 本文给出了不同多标签降级的独一无偏倚的估算值, 包括不相容的单个标值, 各个标签的计算结果会降低不均匀性, 并且通过不偏直观的理论性研究, 我们的调整后期研究, 可能会降低 。