Tensor 列车的平行算法 (Parallel Algorithms for Tensor Train Arithmetic) - 专知论文

会员服务 ·

0

Performer · PARCO · Better · 秩 · MATLAB ·

2021 年 9 月 7 日

Parallel Algorithms for Tensor Train Arithmetic

翻译：Tensor 列车的平行算法

Hussam Al Daas,Grey Ballard,Peter Benner

We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a $34\times$ speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.

翻译：我们为在高压列车(TT)格式中代表的低层压下进行数学操作提供了高效且可扩缩的平行算法。我们考虑添加算法、元素倍增、计算规范和内产物、矩形转换和四舍五入( 快速脱轨) 。这些是诸如迭接 Krylov 软件的内核操作, 利用TT结构的迭接 Krylov 软件。平行算法是为分布式模拟计算设计的, 我们使用一种数据分布式分布式算法和战略, 将计算在TT格式中单个核心的计算同时进行。我们分析提议的算法的计算和通信成本, 以显示其可缩放性, 我们提出数字实验, 以显示这些算法在共享- 模擬和分布式平行系统上的效率。例如, 我们观察到比现有的 MATLAB TT- Toolb 软件在组合 2GB TT TT 10or 时, 并且我们的实施能够利用所有40个核心进行34\ times 加速计算。我们还显示在更大的TTTT 10000 核心操作上几乎直线平行扩展至超过10000 。

0

相关内容

Performer

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

专知会员服务

95+阅读 · 2021年9月21日

【2020新书】C++傻瓜式学习(第四版)，915页pdf

专知会员服务

102+阅读 · 2020年12月19日

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

专知会员服务

129+阅读 · 2020年6月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

135+阅读 · 2020年2月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

专知会员服务

23+阅读 · 2019年11月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2018年11月15日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs

Arxiv

0+阅读 · 2021年10月28日

Rademacher Random Projections with Tensor Networks

Arxiv

0+阅读 · 2021年10月28日

Streaming Generalized Canonical Polyadic Tensor Decompositions

Arxiv

0+阅读 · 2021年10月27日

Spike-and-Slab Generalized Additive Models and Scalable Algorithms for High-Dimensional Data

Arxiv

0+阅读 · 2021年10月27日

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

Arxiv

0+阅读 · 2021年10月27日

Myelin: An asynchronous, message-driven parallel framework for extreme-scale deep learning

Arxiv

0+阅读 · 2021年10月26日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

105+阅读 · 2021年10月30日

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

【干货书】图、网络与算法，655页pdf，Graphs, Networks，and Algorithms

专知会员服务

95+阅读 · 2021年9月21日

【2020新书】C++傻瓜式学习(第四版)，915页pdf

专知会员服务

102+阅读 · 2020年12月19日

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

【MIT硬核新书】深度神经网络高效处理，82页pdf，Efficient Processing of DNN

专知会员服务

129+阅读 · 2020年6月22日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

135+阅读 · 2020年2月25日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

【电子书】现代大数据算法（Modern Big Data Algorithms）52页PDF免费下载

专知会员服务

23+阅读 · 2019年11月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

计算机类 | PLDI 2020等国际会议信息6条

计算机类 | PLDI 2020等国际会议信息6条

Call4Papers

3+阅读 · 2019年7月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

已删除

将门创投

5+阅读 · 2018年11月15日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【推荐】TensorFlow手把手CNN实践指南

【推荐】TensorFlow手把手CNN实践指南

机器学习研究会

5+阅读 · 2017年8月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Exact and Approximate Algorithms for Computing Betweenness Centrality in Directed Graphs

Arxiv

0+阅读 · 2021年10月28日

Rademacher Random Projections with Tensor Networks

Arxiv

0+阅读 · 2021年10月28日

Streaming Generalized Canonical Polyadic Tensor Decompositions

Arxiv

0+阅读 · 2021年10月27日

Spike-and-Slab Generalized Additive Models and Scalable Algorithms for High-Dimensional Data

Arxiv

0+阅读 · 2021年10月27日

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization

Arxiv

0+阅读 · 2021年10月27日

Myelin: An asynchronous, message-driven parallel framework for extreme-scale deep learning

Arxiv

0+阅读 · 2021年10月26日

Tensor Decompositions for temporal knowledge base completion

Arxiv

10+阅读 · 2020年4月10日

Optimal Algorithms for Non-Smooth Distributed Optimization in Networks

Arxiv

7+阅读 · 2018年6月1日

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Arxiv

3+阅读 · 2018年3月13日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员