使用类似查询的积极计量学习和分类 (Active metric learning and classification using similarity queries)

Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation learning and a task-specific goal. Motivated by this, we propose a novel unified query framework that can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. Our approach builds on similarity or nearest neighbor (NN) queries which seek to select samples that result in improved embeddings. The queries consist of a reference and a set of objects, with an oracle selecting the object most similar (i.e., nearest) to the reference. In order to reduce the number of solicited queries, they are chosen adaptively according to an information theoretic criterion. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification -- using a variety of synthetic and real world datasets. In particular, we demonstrate that actively selected NN queries outperform recently developed active triplet selection methods in a deep metric learning setting. Further, we show that in classification, actively selecting class labels can be reformulated as a process of selecting the most informative NN query, allowing direct application of our method.

翻译：积极学习通常用于通过适应性地选择信息最丰富的查询来培训标签效率高的模型。然而,大多数积极的学习战略旨在要么学习数据(例如嵌入或计量学习)的表示方式,要么在数据(例如分类)上完成一个任务(例如分类),但是,许多机器学习任务既涉及代表性学习的组合,又涉及任务特定的目标。为此,我们提议了一个新的统一查询框架,可以适用于一个关键组成部分正在学习反映相似性的数据的表示方式的任何问题。我们的方法基于相似性或最近的邻居(NNN)查询,这些查询寻求选择样本,从而改进嵌入过程。询问包括参考和一组对象,其中有一个或一个或一个以上选择与引用对象最相似的对象(即最接近的对象)。为了减少征求查询的次数,我们根据一个信息理论标准来选择一个适应性的统一查询框架。我们用多种合成和真实世界数据集来显示拟议战略的有效性。我们选择了两种任务 -- -- 积极的计量学习和积极分类 -- -- 的相似性或最近的近邻(NNNN)查询方法,特别是,我们选择一个或最积极选择的升级的标签方法,以便积极选择一个直接选择一个标签的升级方法,以积极选择一个直接选择一个标签的升级方法,从而显示我们最近选择一个选择的升级的升级的标签的三等查询方法。

相关内容

度量学习

关注 3371

度量学习的目的为了衡量样本之间的相近程度，而这也正是模式识别的核心问题之一。大量的机器学习方法，比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法，还有一些基于图的方法，其性能好坏都主要有样本之间的相似度量方法的选择决定。度量学习通常的目标是使同类样本之间的距离尽可能缩小，不同类样本之间的距离尽可能放大。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【经典书】使用机器学习R语言，149页pdf，Practical Machine Learning in R

专知会员服务

24+阅读 · 2021年1月13日

专知会员服务

39+阅读 · 2020年11月3日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集