Positive-unlabeled learning (PU learning) is known as a special case of semi-supervised binary classification where only a fraction of positive examples are labeled. The challenge is then to find the correct classifier despite this lack of information. Recently, new methodologies have been introduced to address the case where the probability of being labeled may depend on the covariates. In this paper, we are interested in establishing risk bounds for PU learning under this general assumption. In addition, we quantify the impact of label noise on PU learning compared to standard classification setting. Finally, we provide a lower bound on minimax risk proving that the upper bound is almost optimal.
翻译:被称为半监督的二进制分类(PU 学习)的特殊案例,其中只标出部分积极的例子。 挑战在于如何找到正确的分类器, 尽管缺乏信息。 最近, 引入了新的方法来解决标签的概率可能取决于共变情况。 在本文中, 我们有兴趣根据这一一般性假设为PU学习设定风险界限。 此外, 我们量化标签噪音对PU学习的影响, 与标准分类设置相比。 最后, 我们提供了较低的迷你卡风险约束, 证明上限几乎是最佳的 。