Active Learning (AL) is a method to iteratively select data for annotation from a pool of unlabeled data, aiming to achieve better model performance than random selection. Previous AL approaches in Natural Language Processing (NLP) have been limited to either task-specific models that are trained from scratch at each iteration using only the labeled data at hand or using off-the-shelf pretrained language models (LMs) that are not adapted effectively to the downstream task. In this paper, we address these limitations by introducing BALM; Bayesian Active Learning with pretrained language Models. We first propose to adapt the pretrained LM to the downstream task by continuing training with all the available unlabeled data and then use it for AL. We also suggest a simple yet effective fine-tuning method to ensure that the adapted LM is properly trained in both low and high resource scenarios during AL. We finally apply Monte Carlo dropout to the downstream model to obtain well-calibrated confidence scores for data selection with uncertainty sampling. Our experiments in five standard natural language understanding tasks demonstrate that BALM provides substantial data efficiency improvements compared to various combinations of acquisition functions, models and fine-tuning methods proposed in recent AL literature.
翻译:主动学习(AL) 是一种从一组未贴标签的数据中迭接地选择用于说明的数据的方法,目的是实现比随机选择更好的模型性能。 以往的自然语言处理(NLP) AL 方法一直局限于在每次迭代时从零开始培训的、仅使用贴标签的数据或使用不适应下游任务的现成预先培训的语言模型(LMs) 。 在本文中,我们通过引入 BALM 来解决这些局限性; 巴伊西亚积极学习,使用预先培训的语言模型。 我们首先提议对预先培训的LM 进行适应下游任务的培训, 继续使用所有现有的未贴标签数据进行培训, 然后再用于AL 。 我们还建议一种简单而有效的微调方法, 以确保经过调整的LM 在AL 期间在低高资源情景中得到适当培训。 我们最后将蒙特卡洛 的辍学应用到下游模式, 以便获得与不确定性抽样数据选择的准确度信任分数。 我们在五项标准的自然语言理解任务中进行的实验表明, BALM 提供了与最近获得功能、模型和微调方法的各种组合相比, 。