We investigate a generalized framework to estimate a latent low-rank plus sparse tensor, where the low-rank tensor often captures the multi-way principal components and the sparse tensor accounts for potential model mis-specifications or heterogeneous signals that are unexplainable by the low-rank part. The framework is flexible covering both linear and non-linear models, and can easily handle continuous or categorical variables. We propose a fast algorithm by integrating the Riemannian gradient descent and a novel gradient pruning procedure. Under suitable conditions, the algorithm converges linearly and can simultaneously estimate both the low-rank and sparse tensors. The statistical error bounds of final estimates are established in terms of the gradient of loss function. The error bounds are generally sharp under specific statistical models, e.g., the robust tensor PCA and the community detection in hypergraph networks with outlier vertices. Moreover, our method achieves non-trivial error bounds for heavy-tailed tensor PCA whenever the noise has a finite $2+\varepsilon$ moment. We apply our method to analyze the international trade flow dataset and the statistician hypergraph co-authorship network, both yielding new and interesting findings.
翻译:我们调查了一个通用框架,以估计潜伏的低位和稀有的沙粒,低位的沙粒往往会捕捉多路主要部件,而稀有的沙粒则会捕捉低位部分无法解释的潜在模型错误特性或混杂信号。这个框架灵活地涵盖线性和非线性模型,可以容易地处理连续或绝对变量。我们提出一个快速算法,将里曼梯度的底部和新的梯度调整程序结合起来。在适当的条件下,算法线性地聚集在一起,同时估计低位和稀有的沙粒。最后估计的统计误差界限是按损失函数的梯度确定的。在具体的统计模型下,例如,强势的高压五氯苯甲醚以及带有外部脊椎的超光谱网络中的社区检测,这些误差一般都是尖锐的。此外,我们的方法在噪音达到限定值为2 ⁇ varepslon的时,就会为重尾部的沙粒子五氯苯甲醚带来非三角误差。我们采用的方法来分析国际贸易流动数据设置和海拔的海拔联合数据网络。