Active Learning (AL) aims to reduce the labeling burden by interactively querying the most informative observations from a data pool. Despite extensive research on improving AL query methods in the past years, recent studies have questioned the advantages of AL, especially in the light of emerging alternative training paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL). Thus, today's AL literature paints an inconsistent picture and leaves practitioners wondering whether and how to employ AL in their tasks. We argue that this heterogeneous landscape is caused by a lack of a systematic and realistic evaluation of AL algorithms, including key parameters such as complex and imbalanced datasets, realistic labeling scenarios, systematic method configuration, and integration of Semi-SL and Self-SL. To this end, we present an AL benchmarking suite and run extensive experiments on five datasets shedding light on the questions: when and how to apply AL?
翻译:积极学习(AL)的目的是通过交互查询数据库中信息最丰富的观测结果来减少标签负担。尽管在过去几年里对改进AL查询方法进行了广泛的研究,但最近的研究质疑AL的优点,特别是考虑到新出现的替代培训模式,例如半监督(Semi-SL)和自监督学习(自监督学习)等。因此,今天的AL文献描绘了一个不一致的图片,让实践者对是否和如何雇用AL执行任务感到疑惑。我们争辩说,这种差异的格局是由于缺乏对AL的算法,包括复杂和不平衡的数据集、现实的标签设想、系统的方法配置以及半监督(Sem-SL)和Self-SL的整合等关键参数的系统而导致的。为此,我们提出了一个AL基准套件,对五个数据集进行了广泛的实验,揭示了问题:何时和如何应用AL?