The Tucker tensor decomposition is a natural extension of the singular value decomposition (SVD) to multiway data. We propose to accelerate Tucker tensor decomposition algorithms by using randomization and parallelization. We present two algorithms that scale to large data and many processors, significantly reduce both computation and communication cost compared to previous deterministic and randomized approaches, and obtain nearly the same approximation errors. The key idea in our algorithms is to perform randomized sketches with Kronecker-structured random matrices, which reduces computation compared to unstructured matrices and can be implemented using a fundamental tensor computational kernel. We provide probabilistic error analysis of our algorithms and implement a new parallel algorithm for the structured randomized sketch. Our experimental results demonstrate that our combination of randomization and parallelization achieves accurate Tucker decompositions much faster than alternative approaches. We observe up to a 16X speedup over the fastest deterministic parallel implementation on 3D simulation data.
翻译:塔克· 虫分解是单值分解( SVD) 与多路数据的自然延伸。 我们提议通过随机化和平行化来加速塔克· 高尔 分解算法。 我们向大型数据和许多处理器展示了两种规模的算法,大大降低了计算和通信成本,与以往的确定和随机化方法相比,大大降低了计算和通信成本,并获得了几乎相同的近似误差。 我们的算法中的关键理念是用克罗内尔结构随机矩阵进行随机化的草图,这样可以减少与非结构化矩阵的计算,并且可以使用基本的高压计算内核来实施。 我们对我们的算法进行概率错误分析,并对结构随机化的草图实施新的平行算法。 我们的实验结果表明,我们随机化和平行化的结合使得精确的塔克分解速度比替代方法要快得多。 我们观测到在3D模拟数据上最快的确定性平行执行速度达到16X的加速度。