Batch active learning is a popular approach for efficiently training machine learning models on large, initially unlabelled datasets, which repeatedly acquires labels for a batch of data points. However, many recent batch active learning methods are white-box approaches limited to differentiable parametric models: they score unlabeled points using acquisition functions based on model embeddings or first- and second-order derivatives. In this paper, we propose black-box batch active learning for regression tasks as an extension of white-box approaches. This approach is compatible with a wide range of machine learning models including regular and Bayesian deep learning models and non-differentiable models such as random forests. It is rooted in Bayesian principles and utilizes recent kernel-based approaches. Importantly, our method only relies on model predictions. This allows us to extend a wide range of existing state-of-the-art white-box batch active learning methods (BADGE, BAIT, LCMD) to black-box models. We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.
翻译:批量积极学习是高效培训大型、最初没有标签的数据集的机器学习模型的流行方法,它反复为一组数据点获得标签。然而,最近许多批量积极学习方法都是限于不同参数模型的白箱方法:它们使用基于模型嵌入或一阶和二阶衍生物的获取功能评分无标签点。在本文中,我们建议黑箱批量积极学习回归任务,作为白箱方法的延伸。这个方法与一系列广泛的机器学习模型兼容,包括定期和巴伊西亚深层学习模型和随机森林等无差别模型。它植根于贝叶西亚原则,并使用最近的内核方法。重要的是,我们的方法仅依靠模型预测。这使我们能够将现有的最先进的白箱批积极学习方法(BADGE、BAIT、LCMD)推广到黑箱模式。我们通过对回归数据集的广泛实验评估,与深层学习模型的白箱方法相比,取得了惊人强的绩效。