We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.
翻译:我们提议制图积极学习(CAL),这是一个新型的积极学习(AL)算法,它利用培训期间个人案例模型的行为作为代名词,寻找最丰富的标签信息。CAL受数据地图的启发,最近有人提议用数据地图来了解数据集的质量(Swayamdipta等人,2020年)。我们把我们的流行文本分类任务方法与常用的AL 战略进行比较,而后者则依靠培训后的行为。我们证明CAL与其他通用的AL 方法相比具有竞争力,表明小种子数据产生的培训动态可以成功地用于AL 。我们通过利用数据地图分析批量统计数据,对我们的新的AL 方法提供了深刻的见解。我们的结果进一步表明,CAL 能够产生一种数据效率更高的学习战略,在培训数据少得多的情况下实现可比或更好的结果。