Linear discriminant analysis (LDA) is a well-known method for multiclass classification and dimensionality reduction. However, in general, ordinary LDA does not achieve high prediction accuracy when observations in some classes are difficult to be classified. This study proposes a novel cluster-based LDA method that significantly improves the prediction accuracy. We adopt hierarchical clustering, and the dissimilarity measure of two clusters is defined by the cross-validation (CV) value. Therefore, clusters are constructed such that the misclassification error rate is minimized. Our approach involves a heavy computational load because the CV value must be computed at each step of the hierarchical clustering algorithm. To address this issue, we develop a regression formulation for LDA and construct an efficient algorithm that computes an approximate value of the CV. The performance of the proposed method is investigated by applying it to both artificial and real datasets. Our proposed method provides high prediction accuracy with fast computation from both numerical and theoretical viewpoints.
翻译:线性差异分析(LDA)是众所周知的多级分类和维度减少的方法。 但是,一般来说,普通LDA在某些类别的观测很难分类时没有达到高预测准确度。本研究提出了一种新的基于集群的LDA方法,大大提高了预测准确性。我们采用了等级分组,而两个组的不相同度量则由交叉校准值(CV)来界定。因此,集群的构建使得错误分类误差率最小化。我们的方法涉及沉重的计算负荷,因为CV值必须在等级组合算法的每个步骤中计算。为了解决这个问题,我们为LDA开发了一个回归公式,并构建了一个高效的算法,计算CV的近似值。通过将这一方法应用于人工和真实的数据集来调查拟议方法的性能。我们提出的方法提供了从数字和理论角度快速计算的高预测准确性。