Linear discriminant analysis (LDA) based classifiers tend to falter in many practical settings where the training data size is smaller than, or comparable to, the number of features. As a remedy, different regularized LDA (RLDA) methods have been proposed. These methods may still perform poorly depending on the size and quality of the available training data. In particular, the test data deviation from the training data model, for example, due to noise contamination, can cause severe performance degradation. Moreover, these methods commit further to the Gaussian assumption (upon which LDA is established) to tune their regularization parameters, which may compromise accuracy when dealing with real data. To address these issues, we propose a doubly regularized LDA classifier that we denote as R2LDA. In the proposed R2LDA approach, the RLDA score function is converted into an inner product of two vectors. By substituting the expressions of the regularized estimators of these vectors, we obtain the R2LDA score function that involves two regularization parameters. To set the values of these parameters, we adopt three existing regularization techniques; the constrained perturbation regularization approach (COPRA), the bounded perturbation regularization (BPR) algorithm, and the generalized cross-validation (GCV) method. These methods are used to tune the regularization parameters based on linear estimation models, with the sample covariance matrix's square root being the linear operator. Results obtained from both synthetic and real data demonstrate the consistency and effectiveness of the proposed R2LDA approach, especially in scenarios involving test data contaminated with noise that is not observed during the training phase.
翻译:在培训数据规模小于或可与特征数量相近的许多实际环境下,基于线性差异分析(LDA)的分类方法往往会在许多实际环境中动摇。作为一种补救措施,提出了不同的常规LDA(LDA)方法。这些方法可能仍然不力,取决于现有培训数据的规模和质量。特别是,由于噪音污染等原因,与培训数据模型的测试数据偏差可能导致严重性能退化。此外,这些方法进一步承诺Gausian假设(即建立LDA)调整其正规化参数,这可能损害处理真实数据时的准确性。为了解决这些问题,我们提议了一种双重的正规化LDA(LDA)分类方法,我们称其为R2LDA。在拟议的R2LDA方法中,将RLDA评分函数转换成两个矢量的内产。通过取代正规化估计这些矢量的表达方式,我们获得了包含两个正规化参数的RDA评分功能。为了确定这些参数,我们采用了三种现行真实的正规化方法,在常规化过程中,在常规化过程中采用这些方法。