This paper studies a high-dimensional inference problem involving the matrix tensor product of random matrices. This problem generalizes a number of contemporary data science problems including the spiked matrix models used in sparse principal component analysis and covariance estimation and the stochastic block model used in network analysis. The main results are single-letter formulas (i.e., analytical expressions that can be approximated numerically) for the mutual information and the minimum mean-squared error (MMSE) in the Bayes optimal setting where the distributions of all random quantities are known. We provide non-asymptotic bounds and show that our formulas describe exactly the leading order terms in the mutual information and MMSE in the high-dimensional regime where the number of rows $n$ and number of columns $d$ scale with $d = O(n^\alpha)$ for some $\alpha < 1/20$. On the technical side, this paper introduces some new techniques for the analysis of high-dimensional matrix-valued signals. Specific contributions include a novel extension of the adaptive interpolation method that uses order-preserving positive semidefinite interpolation paths, and a variance inequality between the overlap and the free energy that is based on continuous-time I-MMSE relations.
翻译:本文研究的是涉及随机矩阵矩阵成份的高度推论问题。 这个问题概括了当代一些数据科学问题,包括用于原始主要成分分析和共变估计的稀少主要成分分析及网络分析所使用的超大矩阵模型和网络分析中使用的随机区块模型。主要结果为用于相互信息的单字母公式(即分析表达方式,可以以数字方式对数值进行近似)和海湾地区最小平均差值(MMSE)的最佳设置,了解所有随机量的分布。我们提供了非不方便的界限,并表明我们的公式准确地描述了在高维体系中共同信息和MMSE中的主要顺序条件,在高维系统中,用美元和美元比值为美元=O(näalpha)的单字母公式和列数。在技术方面,本文介绍了一些用于分析高维基矩阵值信号的最佳设置的新技术。我们的具体贡献包括了适应性内插方法的新的扩展,该方法在共同信息中和MMSE中精确地描述了主要顺序条件,在高维系统中使用了基于顺序和保持正态间能源差异的不断重叠和半定型间关系之间的调间方法。