Distance metric learning has attracted a lot of interest for solving machine learning and pattern recognition problems over the last decades. In this work we present a simple approach based on concepts from statistical physics to learn optimal distance metric for a given problem. We formulate the task as a typical statistical physics problem: distances between patterns represent constituents of a physical system and the objective function corresponds to energy. Then we express the problem as a minimization of the free energy of a complex system, which is equivalent to distance metric learning. Much like for many problems in physics, we propose an approach based on Metropolis Monte Carlo to find the best distance metric. This provides a natural way to learn the distance metric, where the learning process can be intuitively seen as stretching and rotating the metric space until some heuristic is satisfied. Our proposed method can handle a wide variety of constraints including those with spurious local minima. The approach works surprisingly well with stochastic nearest neighbors from neighborhood component analysis (NCA). Experimental results on artificial and real-world data sets reveal a clear superiority over a number of state-of-the-art distance metric learning methods for nearest neighbors classification.
翻译:在过去几十年里,远程计量学习吸引了解决机器学习和模式识别问题的极大兴趣。在这项工作中,我们提出了一个基于统计物理概念的简单方法,以学习对特定问题的最佳距离度量。我们将此任务描述为一个典型的统计物理问题:模式之间的距离代表物理系统的组成部分,客观功能与能源相对;然后我们将问题表述为将一个复杂系统的自由能量降到最低程度,这相当于远程计量学习。与许多物理问题非常相似,我们提议了一种基于Meopolis Monte Carlo 的方法,以找到最佳距离度量。这提供了一种学习距离度量的自然方法,在这个方法中,学习过程可以被直观地视为扩展和旋转测量空间,直到某些超自然的状态。我们提出的方法可以处理各种各样的制约因素,包括具有刺激性本地微量值的限制因素。这个方法与邻里部分分析(NCACA)的相近不相近处十分成功。关于人工和真实世界数据的实验结果显示,在近邻的远程学习方法上明显优越性。