One of the biggest challenges that complicates applied supervised machine learning is the need for huge amounts of labeled data. Active Learning (AL) is a well-known standard method for efficiently obtaining labeled data by first labeling the samples that contain the most information based on a query strategy. Although many methods for query strategies have been proposed in the past, no clear superior method that works well in general for all domains has been found yet. Additionally, many strategies are computationally expensive which further hinders the widespread use of AL for large-scale annotation projects. We, therefore, propose ImitAL, a novel query strategy, which encodes AL as a learning-to-rank problem. For training the underlying neural network we chose Imitation Learning. The required demonstrative expert experience for training is generated from purely synthetic data. To show the general and superior applicability of \ImitAL{}, we perform an extensive evaluation comparing our strategy on 15 different datasets, from a wide range of domains, with 10 different state-of-the-art query strategies. We also show that our approach is more runtime performant than most other strategies, especially on very large datasets.
翻译:使应用监督的机器学习变得复杂的最大挑战之一是需要大量标签数据。积极学习(AL)是一种众所周知的标准方法,通过首先给含有基于查询战略的最大部分信息的样本贴上标签,从而有效获取标签数据。虽然过去曾提出过许多查询战略方法,但还没有找到对所有领域都行之有效的明显优异方法。此外,许多战略都是计算成本高昂的,这进一步阻碍了将AL广泛用于大型批注项目。因此,我们提出了ImitAL,这是一个新颖的查询战略,将AL编码为学习到排序问题。为培训基础神经网络,我们选择了光学学习。所需要的示范性专家培训经验来自纯合成数据。为了显示\Imital ⁇ 的一般和优越适用性,我们进行了广泛的评价,比较了我们关于15个不同数据集的战略,来自广泛的领域,有10种不同的状态查询战略。我们还表明,我们的方法比大多数其他战略,特别是非常大型的数据集,更具有时间性。