Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixed-integer optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated "target neighbors," "triplets," and "similarity pairs." Another salient feature of our method is its ability to enable active learning by recommending precise regions to sample after an optimal metric is computed to improve classification performance. This targeted acquisition can significantly reduce computational burden by ensuring training data completeness, representativeness, and economy. We demonstrate classification and computational performance of the algorithms through several simple and intuitive examples, followed by results on real image and medical datasets.
翻译:集群和分类严重依赖能够对数据点进行有意义的比较的远程测量。 我们提出了混合整数优化方法,以寻找最佳的距离测量方法,将文献中广泛研究的马哈拉诺比衡量标准普遍化。 此外,我们通过取消对预先指定的“目标邻居”、“三联”和“相似配对”的依赖,对主要方法进行概括和改进。 我们方法的另一个突出特征是,它能够通过在计算最佳衡量标准来提高分类性能之后建议精确区域进行抽样抽样,从而积极学习。 这种有针对性的获取可以确保培训数据的完整性、代表性和经济性,从而大大减轻计算负担。 我们通过几个简单和直观的例子,然后通过真实图像和医学数据集的结果,来显示算法的分类和计算绩效。