Active Learning (AL) is a well-known standard method for efficiently obtaining annotated data by first labeling the samples that contain the most information based on a query strategy. In the past, a large variety of such query strategies has been proposed, with each generation of new strategies increasing the runtime and adding more complexity. However, to the best of our our knowledge, none of these strategies excels consistently over a large number of datasets from different application domains. Basically, most of the the existing AL strategies are a combination of the two simple heuristics informativeness and representativeness, and the big differences lie in the combination of the often conflicting heuristics. Within this paper, we propose ImitAL, a domain-independent novel query strategy, which encodes AL as a learning-to-rank problem and learns an optimal combination between both heuristics. We train ImitAL on large-scale simulated AL runs on purely synthetic datasets. To show that ImitAL was successfully trained, we perform an extensive evaluation comparing our strategy on 13 different datasets, from a wide range of domains, with 7 other query strategies.
翻译:积极学习(AL)是一种众所周知的标准方法,通过首先标出含有基于查询策略的最丰富信息的样本,从而有效获取附加说明的数据。过去,提出了各种各样的查询战略,每一代新的战略都增加了运行时间,增加了复杂性。然而,据我们所知,这些战略中没有一个战略始终优于来自不同应用领域的大量数据集。基本上,现有的AL战略大多是两种简单的超常信息性和代表性的组合,而巨大的差异在于往往相互冲突的超常的超常数据组合。在本文中,我们提出ImitAL,一种以域为主的新查询战略,将AL编码为学习到感官的问题,并学习两种超常数据的最佳组合。我们用纯合成数据集对ImitAL进行大规模模拟的模拟AL运行培训。为了证明ImitAL成功地接受了培训,我们用其他7个查询战略对13个不同数据集的战略进行了广泛的比较。