Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.
翻译:获取标签数据在许多机器学习应用程序中具有挑战性,因为预算有限。 积极的学习为选择信息最丰富的数据点和通过降低标签成本提高数据效率提供了一个程序。 信息- 信息- 数据学习原则( 如 BALD ) 最大限度地增加相互信息, 取得了成功, 并在各种积极的学习应用程序中广泛进行了调整。 但是, 基于库的具体目标本身就引入了多余的选择, 还需要为批量选择提供高的计算成本。 在本文中, 我们设计并提出了一个新的不确定性衡量标准, 平衡 Entropy 获得 (BalEntAcq ), 它通过降低软模调概率概率概率概率和标签变量变量之间的不确定性。 要做到这一点, 我们通过Beta 分布, 我们的每个边际学习原则, 将BalEntAct Acrcq 推算为Bal-ral-LD 。 然而, 以不断升级的SBAral-ral-ral-ral-al-lation Ral-lational- decal- slational-deal- slation ex supal- slational- slupal slational slational supal supal supal slupal slational slupal slupal slations ex ex ex ex slupal ex ex ex ex ex ex slation ex sal- slational ex sal- sal- slational- slational- slational- sal- slational- sal- sal- sal- slational- slational- slational- sal- sal ex sal laut ex sal sal ex sal ex sal ex sal ex sal ex sal ladal exal ladal ex sal ex ex sal ex sal ex sal laut ex ex sal ex sal ex ladal ex ex ex ex ex ex ex ex ex ex ex ex ex ex ex