We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs. Our algorithm, COAL, makes predictions by regressing to each label's cost and predicting the smallest. On a new example, it uses a set of regressors that perform well on past data to estimate possible costs for each label. It queries only the labels that could be the best, ignoring the sure losers. We prove COAL can be efficiently implemented for any regression family that admits squared loss optimization; it also enjoys strong guarantees with respect to predictive performance and labeling effort. We empirically compare COAL to passive learning and several active learning baselines, showing significant improvements in labeling effort and test cost on real-world datasets.
翻译:我们为成本敏感的多级分类设计了一种积极的学习算法:不同错误有不同成本的问题。我们的算法,即COAL,通过回归每个标签的成本和预测最小的成本来做出预测。在一个新例子中,它使用一套根据以往数据表现良好的回归器来估计每个标签的可能成本。它只询问最优秀的标签,而忽略肯定的失败者。我们证明COAL可以有效地用于任何承认平方损失优化的回归式家庭;它在预测性能和标签工作方面也享有强有力的保障。我们用经验将COAL与被动学习和几个主动学习基线进行比较,显示标签工作的重大改进和真实世界数据集的测试成本。