We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm operating in the realizable setting for which we prove matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.
翻译:我们考虑的是分批积极学习情景,即学习者适应性地将批量的分点发放到标签符中。批量的抽样标签在实践中非常可取,因为与标签符(通常是人)相比互动的回合数量较少。然而,分批积极学习通常支付较低的适应性价格,导致低于最佳结果。在本文中,我们提出了一个解决方案,要求仔细权衡被问点的信息性和多样性。我们理论上调查在实际相关情景中,未标数据库事先可用(基于集合的积极学习)的分批积极学习。我们分析了新颖的阶段性贪婪算法,并表明,作为标签复杂性的函数,这种算法在可实现的环境中运行的超风险与我们在标准统计学习环境中已知的微缩缩速率相符。我们的结果还表明,对批量规模的依赖度略有减少。这是在信息性和多样性之间谨慎交易的第一个理论结果,以严格量化以批量积极学习在集合情景中的统计表现。