In this work, we initiate the study of one-round active learning, which aims to select a subset of unlabeled data points that achieve the highest model performance after being labeled with only the information from initially labeled data points. The challenge of directly applying existing data selection criteria to the one-round setting is that they are not indicative of model performance when available labeled data is limited. We address the challenge by explicitly modeling the dependence of model performance on the dataset. Specifically, we propose DULO, a data-driven framework for one-round active learning, wherein we learn a model to predict the model performance for a given dataset and then leverage this model to guide the selection of unlabeled data. Our results demonstrate that DULO leads to the state-of-the-art performance on various active learning benchmarks in the one-round setting.
翻译:在这项工作中,我们开始研究一回合积极学习,目的是选择一组未贴标签的数据点,这些未贴标签的数据点在仅用最初贴标签的数据点提供的信息贴上标签后达到最高模型性能。直接将现有数据选择标准应用于一回合环境的挑战在于,当现有标签数据有限时,这些标准并不是示范性业绩的标志。我们通过明确将模型性能依赖数据集作为模型来应对这一挑战。具体地说,我们提议了DULO,这是一个以数据驱动的一回合积极学习框架,我们学习了一个模型,用来预测某一数据集的模型性能,然后利用这一模型指导选择未贴标签数据。我们的结果表明,DULO在一回合环境中导致各种积极学习基准的最新业绩。