Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a constrained, nonlinear optimization problem. We also present a parallel algorithm to perform this computation that organizes the processors into a logical grid with twice as many modes as the input tensor. We show that with correct choices of grid dimensions, the communication cost of the algorithm attains the lower bounds and is therefore communication optimal. Finally, we show that our algorithm can significantly reduce communication compared to the straightforward approach of expressing the computation as a sequence of tensor-times-matrix operations.
翻译:多重时空- Matrix (Multi-TTM) 是计算和操作使用塔克高尔分解的算法的关键计算方法, 后者经常用于多维数据分析。 我们建立了通信下限, 确定需要多少数据流动才能平行进行多端TTM计算。 证据的柱子依赖于分析解决一个受限制的非线性优化问题。 我们还提出了一个平行算法, 来进行这一计算, 将处理器组织成一个逻辑网格, 其模式比输入喇叭多一倍。 我们显示, 如果对网格尺寸作出正确的选择, 算法的通信成本会达到较低的界限, 因此是最佳的通信。 最后, 我们显示, 我们的算法可以大大减少通信量, 而不是直接表达计算方法, 将计算方法表达成数时速矩阵操作的顺序。