We study the problem of estimating a rank-one matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. This problem is motivated by applications in clustering and community detection where latent variables can be partitioned into a fixed number of known groups (e.g., users and items) and the blocks of the matrix correspond to different types of pairwise interactions (e.g., user-user, user-item, or item-item interactions). In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum mean-squared error in estimating both the matrix and the latent variables. These formulas describe the weak recovery thresholds for the problem and reveal invariance properties with respect to certain scalings of the noise variance. We also derive an approximate message passing algorithm and a gradient descent algorithm and show empirically that these algorithms achieve the information-theoretic limits in certain regimes.
翻译:我们研究从高斯观察中估计一等矩阵的问题,在不同的噪音水平下观察到矩阵的不同区块。这个问题的起因是群集和社区探测应用,潜伏变量可以分成固定数目的已知组群(如用户和项目),矩阵的区块与不同类型的对称互动(如用户-用户、用户-项目或项目-项目互动)相对应。在确定区块数目而变量数趋向无限的设置中,我们证明在估计矩阵和潜在变量时,对最小平均差错的公式不那么精确。这些公式描述了问题的回收阈值薄弱,并揭示了噪音差异的某些尺度的不变化性。我们还得出了一种大致的信息传递算法和梯度下位算法,并用经验显示这些算法在某些制度中达到了信息-理论限度。