Majorly classical Active Learning (AL) approach usually uses statistical theory such as entropy and margin to measure instance utility, however it fails to capture the data distribution information contained in the unlabeled data. This can eventually cause the classifier to select outlier instances to label. Meanwhile, the loss associated with mislabeling an instance in a typical classification task is much higher than the loss associated with the opposite error. To address these challenges, we propose a Cost-Based Bugdet Active Learning (CBAL) which considers the classification uncertainty as well as instance diversity in a population constrained by a budget. A principled approach based on the min-max is considered to minimize both the labeling and decision cost of the selected instances, this ensures a near-optimal results with significantly less computational effort. Extensive experimental results show that the proposed approach outperforms several state-of -the-art active learning approaches.
翻译:主要古典主动学习(AL)方法通常使用统计理论,如英特罗比和差值来衡量实例效用,但未能捕捉未贴标签数据中的数据分发信息。这最终可能导致分类者选择外部标签。与此同时,典型分类任务中误贴实例标签引起的损失远远高于与相反错误有关的损失。为了应对这些挑战,我们建议采用成本基虫代特积极学习(CBAL)方法,该方法既考虑分类不确定性,也考虑受预算制约的人口的多样性。基于最小最大值的原则方法被认为能够最大限度地减少选定实例的标签和决定成本,从而确保接近最佳的结果,同时大大降低计算努力。广泛的实验结果显示,拟议的方法优于若干状态的、最先进的积极学习方法。