Linear discriminant analysis (LDA) is a fundamental method for feature extraction and dimensionality reduction. Despite having many variants, classical LDA has its importance, as it is a keystone in human knowledge about pattern recognition. For a dataset containing $C$ clusters, the classical solution to LDA extracts at most $C-1$ features. In this paper, we introduce a novel solution to classical LDA, called LDA++, that yields $C$ features, each one interpretable as measuring similarity to one cluster. This novel solution bridges between dimensionality reduction and multiclass classification. Specifically, we prove that, under some mild conditions, the optimal weights of a linear multiclass classifier for homoscedastic Gaussian data also make an optimal solution to LDA. In addition, this novel interpretable solution reveals some new facts about LDA and its relation with PCA. We provide a complete numerical solution for our novel method, covering the cases 1) when the scatter matrices can be constructed explicitly, 2) when constructing the scatter matrices is infeasible, and 3) the kernel extension. The code is available at https://github.com/k-ghiasi/LDA-plus-plus.
翻译:线性线性线性分析(LDA)是地貌提取和维度减少的基本方法(LDA) 。尽管有多种变量,古典LDA具有重要性,因为它是人类对模式识别知识的基石。对于包含$C的数据集来说,LDA提取的经典解决方案最多为$C-1美元的特征。在本文中,我们为古典LDA引入了一个新颖的解决方案,称为LDA++,每个都产生美元,每个都可解释为衡量与一组相近。这种新颖的解决方案在维度减少和多级分类之间的桥梁。具体地说,我们证明在一些温和条件下,单向同质高质数据线性多级分类的最佳重量也是LDA的最佳解决方案。此外,这种新颖的可解释解决方案揭示了关于LDA及其与常设仲裁院的关系的一些新事实。我们为我们的新方法提供了一个完整的数字解决方案,涵盖以下几个案例:(1) 当可明确构建散性矩阵时,2当构建撒布质矩阵时,2是无法实现的,和3) 核心扩展。代码可在 https://gifus/LD/LD/Agifus/LDGUB.comk。