Active metric learning is the problem of incrementally selecting batches of training data (typically, ordered triplets) to annotate, in order to progressively improve a learned model of a metric over some input domain as rapidly as possible. Standard approaches, which independently select each triplet in a batch, are susceptible to highly correlated batches with many redundant triplets and hence low overall utility. While there has been recent work on selecting decorrelated batches for metric learning \cite{kumari2020batch}, these methods rely on ad hoc heuristics to estimate the correlation between two triplets at a time. We present a novel approach for batch mode active metric learning using the Maximum Entropy Principle that seeks to collectively select batches with maximum joint entropy, which captures both the informativeness and the diversity of the triplets. The entropy is derived from the second-order statistics estimated by dropout. We take advantage of the monotonically increasing submodular entropy function to construct an efficient greedy algorithm based on Gram-Schmidt orthogonalization that is provably $\left( 1 - \frac{1}{e} \right)$-optimal. Our approach is the first batch-mode active metric learning method to define a unified score that balances informativeness and diversity for an entire batch of triplets. Experiments with several real-world datasets demonstrate that our algorithm is robust and consistently outperforms the state-of-the-art.
翻译:主动的衡量学习是逐步选择培训数据批量(通常按顺序排列为三胞胎)进行批次注释的问题,目的是尽可能快地改进在某些输入领域逐步改进一个经学习的计量模型。标准方法,即每批独立选择三胞胎的标准方法,容易产生高度关联的批量,其中有许多多余的三胞胎,因此总体效用较低。虽然最近已经着手选择与装饰有关的批量,用于计量学习 \ cite{kumari2020batch},但这些方法依靠临时的三胞胎方法来估计两个三胞胎之间的关系。我们提出了一个新颖的批量模式主动计量学习模式,使用最大倍增分原则,以集体选择具有最大联合编码的批量,该批量既能捕捉了许多多余的三胞体,又因此总体功率较低。我们利用单调增长的子微调微调增分数的子酶本功能来构建一个高效的贪婪算法,而整个三胞三胞值的批量计算方法是可调的 $\lex-lex-lexnalalalalalalalalalalalalalalalalal-deal-deal-al-deal-al-al-staltimatixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正数-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx