We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a $34\times$ speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.
翻译:我们为在高压列车(TT)格式中代表的低层压下进行数学操作提供了高效且可扩缩的平行算法。 我们考虑添加算法、 元素倍增、 计算规范和内产物、 矩形转换和四舍五入( 快速脱轨) 。 这些是诸如迭接 Krylov 软件的内核操作, 利用TT结构的迭接 Krylov 软件。 平行算法是为分布式模拟计算设计的, 我们使用一种数据分布式分布式算法和战略, 将计算在TT格式中单个核心的计算同时进行。 我们分析提议的算法的计算和通信成本, 以显示其可缩放性, 我们提出数字实验, 以显示这些算法在共享- 模擬和分布式平行系统上的效率 。 例如, 我们观察到比现有的 MATLAB TT- Toolb 软件在组合 2GB TT TT 10or 时, 并且我们的实施能够利用所有40个核心进行34\ times 加速计算。 我们还显示在更大的TTTT 10000 核心操作上几乎直线平行扩展至超过10000 。