Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative latent variable tensor model that incorporates both high rank and low rank models, including but not limited to, simple hypergraphon models, single index models, low-rank CP models, and low-rank Tucker models. Comprehensive results are developed on both the statistical and computational limits for the signal tensor estimation. We find that high-dimensional latent variable tensors are of log-rank; the fact explains the pervasiveness of low-rank tensors in applications. Furthermore, we propose a polynomial-time spectral algorithm that achieves the computationally optimal rate. We show that the statistical-computational gap emerges only for latent variable tensors of order 3 or higher. Numerical experiments and two real data applications are presented to demonstrate the practical merits of our methods.
翻译:张量数据集在推荐系统、神经影像学和社交网络中经常出现。本文依据一个概率性的隐变量张量模型(包括但不限于简单的超图模型、单索引模型、低秩CP模型和低秩Tucker模型),开发了一种估计可能具备高秩的信号张量从而消除观测噪声的概率方法。我们全面研究了张量信号估计的统计和计算上限。我们发现,高维隐变量张量的秩是对数阶的,这解释了低秩张量在应用中的普遍性。此外,我们提出了一种多项式时间的谱算法,可以实现计算的最优速率。我们还证明,对于三阶或更高阶的隐变量张量,统计计算上限仅仅出现了差异。我们展示了数值实验和两个真实数据应用程序,以展示我们方法的实用优点。