探索远距离计量学习中的双重信息,以进行集群组合 (Exploring dual information in distance metric learning for clustering)

Distance metric learning algorithms aim to appropriately measure similarities and distances between data points. In the context of clustering, metric learning is typically applied with the assist of side-information provided by experts, most commonly expressed in the form of cannot-link and must-link constraints. In this setting, distance metric learning algorithms move closer pairs of data points involved in must-link constraints, while pairs of points involved in cannot-link constraints are moved away from each other. For these algorithms to be effective, it is important to use a distance metric that matches the expert knowledge, beliefs, and expectations, and the transformations made to stick to the side-information should preserve geometrical properties of the dataset. Also, it is interesting to filter the constraints provided by the experts to keep only the most useful and reject those that can harm the clustering process. To address these issues, we propose to exploit the dual information associated with the pairwise constraints of the semi-supervised clustering problem. Experiments clearly show that distance metric learning algorithms benefit from integrating this dual information.

翻译：远程计量学习算法旨在适当衡量数据点之间的相似性和距离。在集群方面,标准化学习通常是在专家提供的侧面信息的协助下进行的,最常见的形式是无法链接和必须链接的限制。在这一背景下,远程计量学习算法移动了与链接限制有关的对更近的数据点,而与无法链接的限制有关的对等点则相互移动。为使这些算法有效,必须使用与专家知识、信仰和期望相匹配的距离计量法,以及采用与侧面信息相匹配的转换法来保持数据集的几何性能。此外,还有必要过滤专家提供的制约,只保留最有用的数据,拒绝那些可能损害集群进程的数据。为了解决这些问题,我们提议利用与半监督组合问题对口限制相关的双向信息。实验清楚地表明,远程计量算法从整合这一双重信息中受益。

相关内容

度量学习

关注 3372

度量学习的目的为了衡量样本之间的相近程度，而这也正是模式识别的核心问题之一。大量的机器学习方法，比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法，还有一些基于图的方法，其性能好坏都主要有样本之间的相似度量方法的选择决定。度量学习通常的目标是使同类样本之间的距离尽可能缩小，不同类样本之间的距离尽可能放大。

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日