Active learning is a powerful method for training machine learning models with limited labeled data. One commonly used technique for active learning is BatchBALD, which uses Bayesian neural networks to find the most informative points to label in a pool set. However, BatchBALD can be very slow to compute, especially for larger datasets. In this paper, we propose a new approximation, k-BALD, which uses k-wise mutual information terms to approximate BatchBALD, making it much less expensive to compute. Results on the MNIST dataset show that k-BALD is significantly faster than BatchBALD while maintaining similar performance. Additionally, we also propose a dynamic approach for choosing k based on the quality of the approximation, making it more efficient for larger datasets.
翻译:积极学习是培训带有有限标签数据的机器学习模型的有力方法。积极学习常用的一种技术是BatchBALD,它使用BatchBALD网络找到在集合组中标签的最丰富信息点。然而,BatchBALD的计算速度可能非常慢,特别是对于较大的数据集。在本文中,我们建议一个新的近似(k-BALD),它使用 k-wise 的相互信息术语来接近 BatchBALD, 从而降低计算费用。 MIT 数据集的结果表明, k-BALD 大大快于 BatchBALD, 同时保持类似的性能。此外,我们还提议一种动态方法,根据近似质量选择 k,使更大的数据集更有效。