Distance metric learning algorithms aim to appropriately measure similarities and distances between data points. In the context of clustering, metric learning is typically applied with the assist of side-information provided by experts, most commonly expressed in the form of cannot-link and must-link constraints. In this setting, distance metric learning algorithms move closer pairs of data points involved in must-link constraints, while pairs of points involved in cannot-link constraints are moved away from each other. For these algorithms to be effective, it is important to use a distance metric that matches the expert knowledge, beliefs, and expectations, and the transformations made to stick to the side-information should preserve geometrical properties of the dataset. Also, it is interesting to filter the constraints provided by the experts to keep only the most useful and reject those that can harm the clustering process. To address these issues, we propose to exploit the dual information associated with the pairwise constraints of the semi-supervised clustering problem. Experiments clearly show that distance metric learning algorithms benefit from integrating this dual information.
翻译:远程计量学习算法旨在适当衡量数据点之间的相似性和距离。在集群方面,标准化学习通常是在专家提供的侧面信息的协助下进行的,最常见的形式是无法链接和必须链接的限制。在这一背景下,远程计量学习算法移动了与链接限制有关的对更近的数据点,而与无法链接的限制有关的对等点则相互移动。为使这些算法有效,必须使用与专家知识、信仰和期望相匹配的距离计量法,以及采用与侧面信息相匹配的转换法来保持数据集的几何性能。此外,还有必要过滤专家提供的制约,只保留最有用的数据,拒绝那些可能损害集群进程的数据。为了解决这些问题,我们提议利用与半监督组合问题对口限制相关的双向信息。实验清楚地表明,远程计量算法从整合这一双重信息中受益。