最大限度地实现概念计量学B批次模式主动学习条件信封最大化 (Maximizing Conditional Entropy for Batch-Mode Active Learning of Perceptual Metrics)

Active metric learning is the problem of incrementally selecting batches of training data (typically, ordered triplets) to annotate, in order to progressively improve a learned model of a metric over some input domain as rapidly as possible. Standard approaches, which independently select each triplet in a batch, are susceptible to highly correlated batches with many redundant triplets and hence low overall utility. While there has been recent work on selecting decorrelated batches for metric learning \cite{kumari2020batch}, these methods rely on ad hoc heuristics to estimate the correlation between two triplets at a time. We present a novel approach for batch mode active metric learning using the Maximum Entropy Principle that seeks to collectively select batches with maximum joint entropy, which captures both the informativeness and the diversity of the triplets. The entropy is derived from the second-order statistics estimated by dropout. We take advantage of the monotonically increasing submodular entropy function to construct an efficient greedy algorithm based on Gram-Schmidt orthogonalization that is provably $\left( 1 - \frac{1}{e} \right)$-optimal. Our approach is the first batch-mode active metric learning method to define a unified score that balances informativeness and diversity for an entire batch of triplets. Experiments with several real-world datasets demonstrate that our algorithm is robust and consistently outperforms the state-of-the-art.

翻译：主动的衡量学习是逐步选择培训数据批量(通常按顺序排列为三胞胎)进行批次注释的问题,目的是尽可能快地改进在某些输入领域逐步改进一个经学习的计量模型。标准方法,即每批独立选择三胞胎的标准方法,容易产生高度关联的批量,其中有许多多余的三胞胎,因此总体效用较低。虽然最近已经着手选择与装饰有关的批量,用于计量学习 \ cite{kumari2020batch},但这些方法依靠临时的三胞胎方法来估计两个三胞胎之间的关系。我们提出了一个新颖的批量模式主动计量学习模式,使用最大倍增分原则,以集体选择具有最大联合编码的批量,该批量既能捕捉了许多多余的三胞体,又因此总体功率较低。我们利用单调增长的子微调微调增分数的子酶本功能来构建一个高效的贪婪算法,而整个三胞三胞值的批量计算方法是可调的 $\lex-lex-lexnalalalalalalalalalalalalalalalalal-deal-deal-al-deal-al-al-staltimatixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正正数-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

相关内容

度量学习

关注 3372

度量学习的目的为了衡量样本之间的相近程度，而这也正是模式识别的核心问题之一。大量的机器学习方法，比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法，还有一些基于图的方法，其性能好坏都主要有样本之间的相似度量方法的选择决定。度量学习通常的目标是使同类样本之间的距离尽可能缩小，不同类样本之间的距离尽可能放大。

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【课程】普林斯顿大学19年春季学期《机器学习优化》课程讲义

专知会员服务

85+阅读 · 2019年10月29日