TCUDD: 与感光处理器加速数据库 (TCUDB: Accelerating Database with Tensor Processors) - 专知论文

会员服务 ·

0

Processing（编程语言） · Performer · entity · 标量 · 向量化 ·

2021 年 12 月 14 日

TCUDB: Accelerating Database with Tensor Processors

翻译：TCUDD: 与感光处理器加速数据库

Yu-Ching Hu,Yuliang Li,Hung-Wei Tseng

from arxiv, 16 pages, 14 figures, to appear in the 2022 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2022)

The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportunities of accelerating database systems using NVIDIA's Tensor Core Units (TCUs). We present TCUDB, a TCU-accelerated query engine processing a set of query operators including natural joins and group-by aggregates as matrix operators within TCUs. Matrix multiplication was considered inefficient in the past; however, this strategy has remained largely unexplored in conventional GPU-based databases, which primarily rely on vector or scalar processing. We demonstrate the significant performance gain of TCUDB in a range of real-world applications including entity matching, graph query processing, and matrix-based data analytics. TCUDB achieves up to 288x speedup compared to a baseline GPU-based query engine.

翻译：近年来,新型硬件加速器的出现推动了机器学习的巨大增长。这些加速器在处理高容量矩阵操作员方面带来了无法比拟的绩效收益,特别是矩阵倍增,这是神经网络培训和推断的核心组成部分。在这项工作中,我们探索了利用NVIDIA的Tensor核心单位加速数据库系统的机会。我们介绍了由TCU加速的查询引擎TCUDB, 处理一组查询操作员,包括自然连接和作为TCU内矩阵操作员的组群组合。过去人们认为,矩阵的倍增效率不高;然而,这一战略在很大程度上仍然未在常规的基于GPU的数据库中进行开发,这些数据库主要依赖矢量或星际处理。我们展示了TCUDB在实体匹配、图形查询处理和基于矩阵的数据分析等一系列实际应用中的显著绩效收益。TCUDB与基于基准的GPU的查询引擎相比,达到288x速度。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

系列教程GNN-algorithms之二：《切比雪夫显神威—ChebyNet》

专知会员服务

46+阅读 · 2020年8月4日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

专知会员服务

10+阅读 · 2019年11月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

XLNet太贵？这位小哥在PyTorch Wrapper上做了个微缩版的

XLNet太贵？这位小哥在PyTorch Wrapper上做了个微缩版的

新智元

11+阅读 · 2019年7月6日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Tensor-based Channel Tracking for RIS-Empowered Multi-User MIMO Wireless Systems

Arxiv

0+阅读 · 2022年2月16日

Treating Interference as Noise in Cell-Free Massive MIMO Networks

Arxiv

0+阅读 · 2022年2月15日

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Arxiv

0+阅读 · 2022年2月14日

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Arxiv

0+阅读 · 2022年2月13日

Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout

Arxiv

0+阅读 · 2022年2月11日

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Arxiv

0+阅读 · 2022年2月11日

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Arxiv

0+阅读 · 2022年2月10日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Arxiv

5+阅读 · 2018年12月19日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

最新《序列预测问题导论》教程，212页ppt

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

系列教程GNN-algorithms之二：《切比雪夫显神威—ChebyNet》

专知会员服务

46+阅读 · 2020年8月4日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

【2020新书】Python大数据处理，Mastering Large Datasets with Python，311页pdf

专知会员服务

196+阅读 · 2020年2月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

【O'Reilly TensorFlow Conference 2019】MLIR：加速人工智能（MLIR: Accelerating AI）

专知会员服务

7+阅读 · 2019年11月14日

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

【O'Reilly TensorFlow Conference 2019】HARP：高效的GPU共享系统（HARP: An efficient and elastic GPU-sharing system），Alibaba | Pengfei Fan，Lingling Jin

专知会员服务

10+阅读 · 2019年11月13日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

XLNet太贵？这位小哥在PyTorch Wrapper上做了个微缩版的

XLNet太贵？这位小哥在PyTorch Wrapper上做了个微缩版的

新智元

11+阅读 · 2019年7月6日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

计算机类 | 11月截稿会议信息9条

计算机类 | 11月截稿会议信息9条

Call4Papers

6+阅读 · 2018年10月14日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Tensor-based Channel Tracking for RIS-Empowered Multi-User MIMO Wireless Systems

Arxiv

0+阅读 · 2022年2月16日

Treating Interference as Noise in Cell-Free Massive MIMO Networks

Arxiv

0+阅读 · 2022年2月15日

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Arxiv

0+阅读 · 2022年2月14日

ShortcutFusion: From Tensorflow to FPGA-based accelerator with reuse-aware memory allocation for shortcut data

Arxiv

0+阅读 · 2022年2月13日

Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout

Arxiv

0+阅读 · 2022年2月11日

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Arxiv

0+阅读 · 2022年2月11日

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Arxiv

0+阅读 · 2022年2月10日

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Arxiv

4+阅读 · 2021年6月18日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Fast and Accurate 3D Medical Image Segmentation with Data-swapping Method

Arxiv

5+阅读 · 2018年12月19日

微信扫码咨询专知VIP会员