During recent years, active learning has evolved into a popular paradigm for utilizing user's feedback to improve accuracy of learning algorithms. Active learning works by selecting the most informative sample among unlabeled data and querying the label of that point from user. Many different methods such as uncertainty sampling and minimum risk sampling have been utilized to select the most informative sample in active learning. Although many active learning algorithms have been proposed so far, most of them work with binary or multi-class classification problems and therefore can not be applied to problems in which only samples from one class as well as a set of unlabeled data are available. Such problems arise in many real-world situations and are known as the problem of learning from positive and unlabeled data. In this paper we propose an active learning algorithm that can work when only samples of one class as well as a set of unlabelled data are available. Our method works by separately estimating probability desnity of positive and unlabeled points and then computing expected value of informativeness to get rid of a hyper-parameter and have a better measure of informativeness./ Experiments and empirical analysis show promising results compared to other similar methods.
翻译:近年来,积极学习已发展成一种流行模式,利用用户反馈提高学习算法的准确性。积极学习工作通过在未贴标签的数据中选择最丰富的样本和向用户查询该点的标签来进行。许多不同方法,例如不确定性抽样和最低风险抽样,都用于选择积极学习中最丰富的样本。虽然到目前为止提出了许多积极的学习算法,但大多数都涉及二进制或多级分类问题,因此不能应用于只提供某一类的样本和一组未贴标签数据的问题。这些问题在许多现实世界情况中出现,并被称为从正值和未贴标签数据中学习的问题。在本文件中,我们建议一种积极的学习算法,在只有某一类的样本和一组未贴标签数据可用的情况下,这种算法才能发挥作用。我们的方法是分别估计正值和未贴标签点的概率,然后计算预期的信息价值,以摆脱超准度,并有更好的信息性度。/实验和实证分析表明与其他类似方法相比,结果很有希望。