The low-rank canonical polyadic tensor decomposition is useful in data analysis and can be computed by solving a sequence of overdetermined least squares subproblems. Motivated by consideration of sparse tensors, we propose sketching each subproblem using leverage scores to select a subset of the rows, with probabilistic guarantees on the solution accuracy. We randomly sample rows proportional to leverage score upper bounds that can be efficiently computed using the special Khatri-Rao subproblem structure inherent in tensor decomposition. Crucially, for a $(d+1)$-way tensor, the number of rows in the sketched system is $O(r^d/\epsilon)$ for a decomposition of rank $r$ and $\epsilon$-accuracy in the least squares solve, independent of both the size and the number of nonzeros in the tensor. Along the way, we provide a practical solution to the generic matrix sketching problem of sampling overabundance for high-leverage-score rows, proposing to include such rows deterministically and combine repeated samples in the sketched system; we conjecture that this can lead to improved theoretical bounds. Numerical results on real-world large-scale tensors show the method is significantly faster than deterministic methods at nearly the same level of accuracy.
翻译:在数据分析中,低调的卡门多立方体分解法对数据分析有用,并且可以通过解析一个被定得过高的最小平方子子问题序列来计算。在考虑稀疏的加热器的情况下,我们提议用杠杆分数来勾画每个子问题,以选择行的一个子组,对溶解的准确性提供概率保障。我们随机抽样行,以利用高压分解所固有的特殊Khatri-Rao次问题结构来影响上限的得分。对于一个(d+1) $-way Exor, 素描系统中的行数是O(r ⁇ d/\epsilon) $(r ⁇ d/\epsilon), 用于在最小方块溶解分解中分解美元和美元-croball 的分数。我们随机矩阵为高分解问题总基数的(d+1美元+1美元- $- way ador) 。对于高分数的分解法中, 素谱系统中的行数是1美元(rble) $(rb) $(rán) $(rqd/rd/ d/\ eplonlonlonallonlonlonlon) ) ro) 。我们建议在最小的行中的行中, 数数数数数数数数是用来算法式的分解法几乎数为美元分解得分解得分解得分解分解法方法可以大幅) 表示得分解得分解得分解得分解得分数(o) 。提议, 。在最快的分解后,在最快的分解方法在最快,在最快的方法中将这种分解方法中将这种分解方法中将这种分解得分解得分解方法与高的分解法方法中将这种分解得分解得分解得分解得分解得分法方法与高。