In this paper, we consider the general problem of testing the mean of two high-dimensional distributions with a common, unknown covariance using a linear classifier. Traditionally such a classifier is formed from the sample covariance matrix of some given training data, but, as is well-known, the performance of this classifier is poor when the number of training data $n$ is not much larger than the data dimension $p$. We thus seek a covariance estimator to replace sample covariance. To account for the fact that $n$ and $p$ may be of comparable size, we adopt the "large-dimensional asymptotic model" in which $n$ and $p$ go to infinity in a fixed ratio. Under this assumption, we identify a covariance estimator that is detection-theoretic optimal within the general shrinkage class of C. Stein, and we give consistent estimates for the corresponding classifier's type-I and type-II errors.
翻译:在本文中,我们考虑了使用线性分类器测试两个高维分布值的平均值,使用一个普通的、未知的共差值的一般问题。传统上,这种分类器是由某些特定培训数据的样本共差矩阵组成的,但众所周知,当培训数据数量不比数据维度大得多时,这个分类器的性能就差了。因此我们寻求一个共差估计器来取代样本共差。考虑到美元和美元可能具有相似的大小,我们采用了“大维无损模型”,在模型中,美元和美元将固定比例地用于无限化。在此假设下,我们确定一个在C. Stein总收缩类中检测-理论最佳的共差数估计器,我们为相应的分类器第一类和二类误差给出一致的估计。