In this paper, we develop a new classification method based on nearest centroid, and it is called the nearest disjoint centroid classifier. Our method differs from the nearest centroid classifier in the following two aspects: (1) the centroids are defined based on disjoint subsets of features instead of all the features, and (2) the distance is induced by the dimensionality-normalized norm instead of the Euclidean norm. We provide a few theoretical results regarding our method. In addition, we propose a simple algorithm based on adapted k-means clustering that can find the disjoint subsets of features used in our method, and extend the algorithm to perform feature selection. We evaluate and compare the performance of our method to other classification methods on both simulated data and real-world gene expression datasets. The results demonstrate that our method is able to outperform other competing classifiers by having smaller misclassification rates and/or using fewer features in various settings and situations.
翻译:在本文中,我们开发了一种以最近的机器人为基础的新的分类方法,它被称为最近的脱节的固醇分类器。我们的方法在以下两个方面与最近的固醇分类器有不同之处:(1) 固醇是根据特征的脱节子集而不是所有特征来定义的,(2) 距离是由维度标准化规范而不是欧clidean规范来诱导的。我们提供了有关我们的方法的一些理论结果。此外,我们提出了一种简单的算法,它基于修改过的 k- poles 群集,可以找到我们方法中使用的特征的脱节子集,并将算法扩展至进行特征选择。我们评估并比较了我们方法的性能与其他分类方法在模拟数据和现实世界基因表达数据集上的性能。结果表明,我们的方法能够通过降低分类率和/或在不同情况下使用较少的特性来超越其他相互竞争的分类者。