We consider a batch active learning scenario where the learner adaptively issues batches of points to a labeling oracle. Sampling labels in batches is highly desirable in practice due to the smaller number of interactive rounds with the labeling oracle (often human beings). However, batch active learning typically pays the price of a reduced adaptivity, leading to suboptimal results. In this paper we propose a solution which requires a careful trade off between the informativeness of the queried points and their diversity. We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand ({\em pool-based} active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the first theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario.
翻译:我们考虑的是分批积极学习情景,即学习者适应性地将批量的分点发放到标签符中。分批抽样标签在实践中非常可取,因为与标签符(通常是人)的交互回合数量较少。然而,分批积极学习通常支付较低的适应性价格,导致不理想的结果。在本文件中,我们提出了一个解决方案,要求谨慎地交换被查询点的信息性和多样性。我们理论上调查在实际相关情景中,在事先可获得未标的数据集合(基于集合的主动学习)的情况下,分批积极学习。我们分析了新颖的阶段性贪婪算法,并表明,作为标签复杂性的函数,这一算法的超风险与标准统计学习环境中已知的微缩算法率相符。我们的结果还表明,对批量规模的依赖度略有减少。这是在信息性和多样性之间谨慎地进行交换的第一个理论结果,以严格量化在集合假设中进行批量积极学习的统计表现。