The estimation of covariance matrices of multiple classes with limited training data is a difficult problem. The sample covariance matrix (SCM) is known to perform poorly when the number of variables is large compared to the available number of samples. In order to reduce the mean squared error (MSE) of the SCM, regularized (shrinkage) SCM estimators are often used. In this work, we consider regularized SCM (RSCM) estimators for multiclass problems that couple together two different target matrices for regularization: the pooled (average) SCM of the classes and the scaled identity matrix. Regularization toward the pooled SCM is beneficial when the population covariances are similar, whereas regularization toward the identity matrix guarantees that the estimators are positive definite. We derive the MSE optimal tuning parameters for the estimators as well as propose a method for their estimation under the assumption that the class populations follow (unspecified) elliptical distributions with finite fourth-order moments. The MSE performance of the proposed coupled RSCMs are evaluated with simulations and in a regularized discriminant analysis (RDA) classification set-up on real data. The results based on three different real data sets indicate comparable performance to cross-validation but with a significant speed-up in computation time.
翻译:对培训数据有限的多个班级的共变矩阵进行估算是一个困难的问题。当变量数量与现有样本数量相比较大时,样本共变矩阵(SCM)据知表现不佳。为了减少SCM的平均平方差(MSE),经常使用常规化(缩略) SCM测算器。在这项工作中,我们认为对多级问题进行常规化的SCM(RSCM)测算器,将两类不同的标准化目标矩阵(类集(平均) SCM) 和规模化身份矩阵) 结合起来进行正规化。当人口变异相似时,常规化到联合的SCM(SCM) 将是有益的,而身份矩阵的正规化则保证估计者是肯定的。我们根据以下假设来得出MSE(S) 最佳的测算参数,并提议一种估算方法:班级人口在限定的第四级时段时段分布。对拟议中的RECMM(平均)性能进行模拟,并在定期化的盘点评分析中,用真实的三套数据进行对比性进度分析。