FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs - 专知论文

会员服务 ·

0

容差 · Performer · 英特尔 (Intel) · 级联 · 序列化 ·

2023 年 5 月 9 日

FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

翻译：暂无翻译

Shixun Wu,Yujia Zhai,Jiajun Huang,Zizhe Jian,Zizhong Chen

from arxiv, arXiv admin note: substantial text overlap with arXiv:2104.00897

General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by fusing the memory-intensive operations into the GEMM assembly kernels. We design a cache-friendly scheme for parallel FT-GEMM. Experimental results on Intel Cascade Lake demonstrate that FT-GEMM offers high reliability and performance -- faster than Intel MKL, OpenBLAS, and BLIS by 3.50\%$\sim$ 22.14\% for both serial and parallel GEMM, even under hundreds of errors injected per minute.

翻译：暂无翻译

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

79+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

157+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

机器之心

0+阅读 · 2022年10月15日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

12+阅读 · 2018年6月24日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

千核级通用微处理器共享存储体系结构研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子计算环境下的格公钥密码体制

国家自然科学基金

0+阅读 · 2014年12月31日

异构动态移动通信网络的延时优化

国家自然科学基金

2+阅读 · 2013年12月31日

基于PRAM的主存储器在虚拟化环境下的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

高性能CPU中动态逻辑电路的低功耗方法学研究

国家自然科学基金

0+阅读 · 2012年12月31日

氧化锡@超薄类石墨烯碳球材料的设计合成与储锂机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

CPU Cache的功耗驱动设计方法及工具研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理计算的I/O性能优化

国家自然科学基金

1+阅读 · 2012年12月31日

在设计阶段验证密码芯片安全程度的方法

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

Arxiv

0+阅读 · 2023年6月23日

Document Image Cleaning using Budget-Aware Black-Box Approximation

Arxiv

0+阅读 · 2023年6月22日

Exploration on HuBERT with Multiple Resolutions

Arxiv

0+阅读 · 2023年6月22日

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

Arxiv

0+阅读 · 2023年6月22日

Solving the complete pseudo-impulsive radiation and diffraction problem using a spectral element method

Arxiv

0+阅读 · 2023年6月22日

A Stochastic ADMM Algorithm for Large-Scale Ptychography with Weighted Difference of Anisotropic and Isotropic Total Variation

Arxiv

0+阅读 · 2023年6月22日

Improving Software Requirements Prioritization through the Lens of Constraint Solving

Arxiv

0+阅读 · 2023年6月21日

Space-time design for deep joint source channel coding of images Over MIMO channels

Arxiv

0+阅读 · 2023年6月20日

Using super-resolution for enhancing visual perception and segmentation performance in veterinary cytology

Arxiv

0+阅读 · 2023年6月20日

Decoding Urban-health Nexus: Interpretable Machine Learning Illuminates Cancer Prevalence based on Intertwined City Features

Arxiv

0+阅读 · 2023年6月20日

VIP会员

文章信息

相关主题

英特尔 (Intel)

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

79+阅读 · 2020年7月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

157+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版 | 生成式人工智能（GenAI）：概览、议题与美国国会考量

中文版 | 利用快速部署技术将任何环境转化为监控区域

中文版 | 军事AI第二阶段已至：五角大楼推动生成式AI的三大待解问题

面向大模型多智能体系统的多维评估方法

相关资讯

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

Purdue电子与计算机工程系李海桐NanoX实验室招收AI硬件全奖博士生（2023秋季）

机器之心

0+阅读 · 2022年10月15日

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

不再让CPU和总线拖后腿：Exafunction让GPU跑的更快！

机器之心

0+阅读 · 2022年10月7日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

12+阅读 · 2018年6月24日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

相关论文

FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

FPGA Implementation of Convolutional Neural Network for Real-Time Handwriting Recognition

Arxiv

0+阅读 · 2023年6月23日

Document Image Cleaning using Budget-Aware Black-Box Approximation

Arxiv

0+阅读 · 2023年6月22日

Exploration on HuBERT with Multiple Resolutions

Arxiv

0+阅读 · 2023年6月22日

xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

Arxiv

0+阅读 · 2023年6月22日

Solving the complete pseudo-impulsive radiation and diffraction problem using a spectral element method

Arxiv

0+阅读 · 2023年6月22日

A Stochastic ADMM Algorithm for Large-Scale Ptychography with Weighted Difference of Anisotropic and Isotropic Total Variation

Arxiv

0+阅读 · 2023年6月22日

Improving Software Requirements Prioritization through the Lens of Constraint Solving

Arxiv

0+阅读 · 2023年6月21日

Space-time design for deep joint source channel coding of images Over MIMO channels

Arxiv

0+阅读 · 2023年6月20日

Using super-resolution for enhancing visual perception and segmentation performance in veterinary cytology

Arxiv

0+阅读 · 2023年6月20日

Decoding Urban-health Nexus: Interpretable Machine Learning Illuminates Cancer Prevalence based on Intertwined City Features

Arxiv

0+阅读 · 2023年6月20日

相关基金

千核级通用微处理器共享存储体系结构研究

国家自然科学基金

0+阅读 · 2014年12月31日

量子计算环境下的格公钥密码体制

国家自然科学基金

0+阅读 · 2014年12月31日

异构动态移动通信网络的延时优化

国家自然科学基金

2+阅读 · 2013年12月31日

基于PRAM的主存储器在虚拟化环境下的关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

高性能CPU中动态逻辑电路的低功耗方法学研究

国家自然科学基金

0+阅读 · 2012年12月31日

氧化锡@超薄类石墨烯碳球材料的设计合成与储锂机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

CPU Cache的功耗驱动设计方法及工具研究

国家自然科学基金

0+阅读 · 2012年12月31日

高能物理计算的I/O性能优化

国家自然科学基金

1+阅读 · 2012年12月31日

在设计阶段验证密码芯片安全程度的方法

国家自然科学基金

0+阅读 · 2009年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员