Quantification is the research field that studies methods for counting the number of data points that belong to each class in an unlabeled sample. Traditionally, researchers in this field assume the availability of labelled observations for all classes to induce a quantification model. However, we often face situations where the number of classes is large or even unknown, or we have reliable data for a single class. When inducing a multi-class quantifier is infeasible, we are often concerned with estimates for a specific class of interest. In this context, we have proposed a novel setting known as One-class Quantification (OCQ). In contrast, Positive and Unlabeled Learning (PUL), another branch of Machine Learning, has offered solutions to OCQ, despite quantification not being the focal point of PUL. This article closes the gap between PUL and OCQ and brings both areas together under a unified view. We compare our method, Passive Aggressive Threshold (PAT), against PUL methods and show that PAT generally is the fastest and most accurate algorithm. PAT induces quantification models that can be reused to quantify different samples of data. We additionally introduce Exhaustive TIcE (ExTIcE), an improved version of the PUL algorithm Tree Induction for c Estimation (TIcE). We show that ExTIcE quantifies more accurately than PAT and the other assessed algorithms in scenarios where several negative observations are identical to the positive ones.
翻译:量化是一个研究领域,研究如何计算属于无标签抽样的每个类别的数据点数。传统上,这个领域的研究人员假定所有类别都可获得贴标签的观察,以产生一个量化模型。然而,我们常常面临类数大或甚至未知的情况,或者我们拥有单类的可靠数据。当引入多类量化标准不可行时,我们往往关注特定利益类别的估计。在这方面,我们提议了一个称为单类量化的新型设置。相比之下,正类和无标签学习(PUL)是另一个机器学习分支,为OCQ提供了解决方案,尽管量化不是PUL的协调中心。这篇文章缩小了PUL和OCQ之间的差距,并将这两个区域放在一个统一的观点之下。我们比较了我们的方法,即被动递增缩缩缩阈值(PAT),并显示一般而言,PAT是其他最快速和最准确的算法。PAT引出量化模型,在另一个机器学习分支中,为OCQI提供了对OC的量化模型,尽管量化不是PAT的焦点,而是将不同的递增量化。