In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other closely related classifiers on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.
翻译:在本文中,我们开发了一种以最近的机器人为基础的新的分类方法,它被称为最近的脱节的固醇分类器。我们的方法在以下两个方面与最近的固醇分类器有不同之处:(1) 固醇是根据特征的脱节子集而不是所有特征来定义的,(2) 距离是由维度标准化规范而不是欧clidean规范来诱导的。我们提供了有关我们的方法的一些理论结果。此外,我们提出了一种简单的算法,它基于修改过的 k- poles 群集,可以找到我们方法中使用的特征的脱节子集,并将算法扩展用于进行特征选择。我们评估和比较我们的方法在模拟数据和现实世界基因表达数据集方面与其他密切相关的分类器的性能。结果表明,我们的方法能够通过缩小分类错误率和/或在不同环境和情况下使用较少的特性来超越其他相互竞争的分类器。