Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation learning and a task-specific goal. Motivated by this, we propose a novel unified query framework that can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. Our approach builds on similarity or nearest neighbor (NN) queries which seek to select samples that result in improved embeddings. The queries consist of a reference and a set of objects, with an oracle selecting the object most similar (i.e., nearest) to the reference. In order to reduce the number of solicited queries, they are chosen adaptively according to an information theoretic criterion. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification -- using a variety of synthetic and real world datasets. In particular, we demonstrate that actively selected NN queries outperform recently developed active triplet selection methods in a deep metric learning setting. Further, we show that in classification, actively selecting class labels can be reformulated as a process of selecting the most informative NN query, allowing direct application of our method.
翻译:积极学习通常用于通过适应性地选择信息最丰富的查询来培训标签效率高的模型。然而,大多数积极的学习战略旨在要么学习数据(例如嵌入或计量学习)的表示方式,要么在数据(例如分类)上完成一个任务(例如分类),但是,许多机器学习任务既涉及代表性学习的组合,又涉及任务特定的目标。为此,我们提议了一个新的统一查询框架,可以适用于一个关键组成部分正在学习反映相似性的数据的表示方式的任何问题。我们的方法基于相似性或最近的邻居(NNN)查询,这些查询寻求选择样本,从而改进嵌入过程。询问包括参考和一组对象,其中有一个或一个或一个以上选择与引用对象最相似的对象(即最接近的对象)。为了减少征求查询的次数,我们根据一个信息理论标准来选择一个适应性的统一查询框架。我们用多种合成和真实世界数据集来显示拟议战略的有效性。我们选择了两种任务 -- -- 积极的计量学习和积极分类 -- -- 的相似性或最近的近邻(NNNN)查询方法,特别是,我们选择一个或最积极选择的升级的标签方法,以便积极选择一个直接选择一个标签的升级方法,以积极选择一个直接选择一个标签的升级方法,从而显示我们最近选择一个选择的升级的升级的标签的三等查询方法。