This paper considers the problem of testing for latent structure in large symmetric data matrices. The goal here is to develop statistically principled methodology that is flexible in its applicability, computationally efficient, and insensitive to extreme data variation, thereby overcoming limitations facing existing approaches. To do so, we introduce and systematically study certain symmetric matrices, called Wilcoxon--Wigner random matrices, whose entries are normalized rank statistics derived from an underlying independent and identically distributed sample of absolutely continuous random variables. These matrices naturally arise as the matricization of one-sample problems in statistics and conceptually lie at the interface of nonparametrics, multivariate analysis, and data reduction. Among our results, we establish that the leading eigenvalue and corresponding eigenvector of Wilcoxon--Wigner random matrices admit asymptotically Gaussian fluctuations with explicit centering and scaling terms. These asymptotic results enable rigorous parameter-free and distribution-free spectral methodology for addressing two hypothesis testing problems, namely community detection and principal submatrix detection. Numerical examples illustrate the performance of the proposed approach. Throughout, our findings are juxtaposed with existing results based on the spectral properties of independent entry symmetric random matrices in signal-plus-noise data settings.
翻译:本文研究大型对称数据矩阵中潜在结构的检验问题。目标是发展具有统计原则性、应用灵活性、计算高效性且对极端数据变化不敏感的方法论,从而克服现有方法面临的局限性。为此,我们引入并系统研究一类称为Wilcoxon-Wigner随机矩阵的对称矩阵,其元素源自绝对连续随机变量的独立同分布样本所导出的归一化秩统计量。这些矩阵自然产生于统计学中单样本问题的矩阵化表示,在概念上处于非参数统计、多元分析和数据降维的交叉领域。我们的研究结果表明,Wilcoxon-Wigner随机矩阵的主特征值及对应特征向量具有渐近高斯波动特性,且包含显式的中心化与尺度调整项。这些渐近结果为解决两个假设检验问题(即社区检测与主子矩阵检测)提供了严格的、无参数且与分布无关的谱方法学基础。数值实验展示了所提方法的性能。全文将我们的发现与现有基于信号加噪声数据场景中独立条目对称随机矩阵谱性质的研究结果进行了对比分析。