Conventional active learning algorithms assume a single labeler that produces noiseless label at a given, fixed cost, and aim to achieve the best generalization performance for given classifier under a budget constraint. However, in many real settings, different labelers have different labeling costs and can yield different labeling accuracies. Moreover, a given labeler may exhibit different labeling accuracies for different instances. This setting can be referred to as active learning with diverse labelers with varying costs and accuracies, and it arises in many important real settings. It is therefore beneficial to understand how to effectively trade-off between labeling accuracy for different instances, labeling costs, as well as the informativeness of training instances, so as to achieve the best generalization performance at the lowest labeling cost. In this paper, we propose a new algorithm for selecting instances, labelers (and their corresponding costs and labeling accuracies), that employs generalization bound of learning with label noise to select informative instances and labelers so as to achieve higher generalization accuracy at a lower cost. Our proposed algorithm demonstrates state-of-the-art performance on five UCI and a real crowdsourcing dataset.
翻译:常规主动学习算法假定一个单一标签师,以特定固定成本生产无噪音标签,目的是在预算限制下实现特定分类员的最佳通用性能。 但是,在许多真实环境中,不同的标签师有不同的标签成本,可以产生不同的标签优雅。 此外,一个特定标签师可以在不同情况下展示不同的标签优雅度。这个设置可以被称为与不同标签师的积极学习,成本和舒适度各不相同,在许多重要的真实环境中出现。因此,了解如何在不同实例的标签准确性、标签成本以及培训实例的丰富性之间进行有效的权衡,以便在最低标签成本上实现最佳的通用性能。在本文中,我们提出了一种选择实例的新算法,标签师(及其相应的成本和标签优美度标签标签),采用带有标签噪声的统称,选择信息实例和标签师,以便以较低的成本实现更高的通用性精准性。我们提议的算法展示了5个UCI和真实的众包数据集的状态。