When training automated systems, it has been shown to be beneficial to adapt the representation of data by learning a problem-specific metric. This metric is global. We extend this idea and, for the widely used family of k nearest neighbors algorithms, develop a method that allows learning locally adaptive metrics. These local metrics not only improve performance but are naturally interpretable. To demonstrate important aspects of how our approach works, we conduct a number of experiments on synthetic data sets, and we show its usefulness on real-world benchmark data sets.
翻译:当培训自动化系统时,通过学习针对具体问题的衡量标准来调整数据表示方式已证明是有益的。这个衡量标准是全球性的。我们扩展了这个概念,并且对于被广泛使用的 k 近邻算法家族来说,我们开发了一种方法,以便学习当地适应性衡量标准。这些本地衡量标准不仅能提高性能,而且自然可以解释。为了展示我们方法如何发挥作用的重要方面,我们进行了一些合成数据集实验,并在现实世界基准数据集中展示了它的有用性。