In deep active learning, it is especially important to choose multiple examples to markup at each step to work efficiently, especially on large datasets. At the same time, existing solutions to this problem in the Bayesian setup, such as BatchBALD, have significant limitations in selecting a large number of examples, associated with the exponential complexity of computing mutual information for joint random variables. We, therefore, present the Large BatchBALD algorithm, which gives a well-grounded approximation to the BatchBALD method that aims to achieve comparable quality while being more computationally efficient. We provide a complexity analysis of the algorithm, showing a reduction in computation time, especially for large batches. Furthermore, we present an extensive set of experimental results on image and text data, both on toy datasets and larger ones such as CIFAR-100.
翻译:在深层学习中,特别重要的是在每一个步骤选择多个例子,以标记,高效率地工作,特别是在大型数据集方面。与此同时,巴奇巴拉德等巴耶斯结构中对这一问题的现有解决办法在选择大量例子方面有很大的局限性,这些例子与为联合随机变量计算共同信息的指数复杂性有关。因此,我们提出了大型巴奇巴德算法,它为BatchBALD方法提供了一个有根据的近似法,该方法的目的是在提高计算效率的同时实现可比质量。我们提供了对算法的复杂分析,显示了计算时间的缩短,特别是大批量的计算时间。此外,我们还提出了一套关于图象和文本数据的广泛的实验结果,既包括玩具数据集,也包括更大的数据,如CIFAR-100。