Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors - 专知论文

会员服务 ·

0

cache · 线性的 · 块 · 核化 · Performer ·

2023 年 4 月 27 日

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors

翻译：暂无翻译

Héctor Martínez,Sandra Catalán,Francisco D. Igual,José R. Herrero,Rafael Rodríguez-Sánchez,Enrique S. Quintana-Ortí

This paper advocates for an intertwined design of the dense linear algebra software stack that breaks down the strict barriers between the high-level, blocked algorithms in LAPACK (Linear Algebra PACKage) and the low-level, architecture-dependent kernels in BLAS (Basic Linear Algebra Subprograms). Specifically, we propose customizing the GEMM (general matrix multiplication) kernel, which is invoked from the blocked algorithms for relevant matrix factorizations in LAPACK, to improve performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage an analytical model to dynamically adapt the cache configuration parameters of the GEMM to the shape of the matrix operands. Additionally, we accommodate a flexible development of architecture-specific micro-kernels that allow us to further improve the utilization of the cache hierarchy. Our experiments on two platforms, equipped with ARM (NVIDIA Carmel, Neon) and x86 (AMD EPYC, AVX2) multi-core processors, demonstrate the benefits of this approach in terms of better cache utilization and, in general, higher performance. However, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage.

翻译：暂无翻译

0

相关内容

cache

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

玻尔兹曼方程和流体方程中的渐进极限和边界层分析问题

国家自然科学基金

0+阅读 · 2014年12月31日

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

原子层沉积稀土氧化物和硅酸盐纳米复合薄膜硅基MOS电致发光器件的研究

国家自然科学基金

0+阅读 · 2012年12月31日

川滇地区地震动估计及烈度速判新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于聚苯胺/二氧化锡纳米复合材料的电子标签式乙烯气体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

理论模拟含氢键二次有机气溶胶的振动光谱

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

超短脉冲激光整形光谱物理及应用

国家自然科学基金

0+阅读 · 2011年12月31日

光晶格中超冷原子的相变和动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

场论与粒子物理中的量子纠缠与退相干

国家自然科学基金

0+阅读 · 2008年12月31日

Analysis-aware defeaturing of complex geometries with Neumann features

Arxiv

0+阅读 · 2023年6月13日

NetGAP: A Graph-Grammar approach for concept design of networked platforms with extra-functional requirements

Arxiv

0+阅读 · 2023年6月13日

Formation-of-Arrays Antenna Technology for High-Throughput Mobile Non-Terrestrial Networks

Arxiv

0+阅读 · 2023年6月13日

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Arxiv

0+阅读 · 2023年6月12日

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Arxiv

0+阅读 · 2023年6月12日

Fast Approximation of Polynomial Zeros and Matrix Eigenvalues

Arxiv

0+阅读 · 2023年6月12日

Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 prior

Arxiv

0+阅读 · 2023年6月11日

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Arxiv

0+阅读 · 2023年6月10日

Computing Algorithm for an Equilibrium of the Generalized Stackelberg Game

Arxiv

0+阅读 · 2023年6月9日

On games and simulators as a platform for development of artificial intelligence for command and control

On games and simulators as a platform for development of artificial intelligence for command and control

Arxiv

89+阅读 · 2021年10月21日

VIP会员

文章信息

相关主题

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

40+阅读 · 2022年10月10日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

Deep Research（深度研究）：系统性综述

《革新战术战场空间能力：反无人机系统》报告

【普林斯顿博士论文】用于语音的生成式通用模型

螺旋式开发作为战略资产：美军启示

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Analysis-aware defeaturing of complex geometries with Neumann features

Arxiv

0+阅读 · 2023年6月13日

NetGAP: A Graph-Grammar approach for concept design of networked platforms with extra-functional requirements

Arxiv

0+阅读 · 2023年6月13日

Formation-of-Arrays Antenna Technology for High-Throughput Mobile Non-Terrestrial Networks

Arxiv

0+阅读 · 2023年6月13日

Intelligent Multi-channel Meta-imagers for Accelerating Machine Vision

Arxiv

0+阅读 · 2023年6月12日

FADI: Fast Distributed Principal Component Analysis With High Accuracy for Large-Scale Federated Data

Arxiv

0+阅读 · 2023年6月12日

Fast Approximation of Polynomial Zeros and Matrix Eigenvalues

Arxiv

0+阅读 · 2023年6月12日

Intuitive Joint Priors for Bayesian Linear Multilevel Models: The R2D2M2 prior

Arxiv

0+阅读 · 2023年6月11日

Local object crop collision network for efficient simulation of non-convex objects in GPU-based simulators

Arxiv

0+阅读 · 2023年6月10日

Computing Algorithm for an Equilibrium of the Generalized Stackelberg Game

Arxiv

0+阅读 · 2023年6月9日

On games and simulators as a platform for development of artificial intelligence for command and control

On games and simulators as a platform for development of artificial intelligence for command and control

Arxiv

89+阅读 · 2021年10月21日

相关基金

玻尔兹曼方程和流体方程中的渐进极限和边界层分析问题

国家自然科学基金

0+阅读 · 2014年12月31日

食品风险残留物快速检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

原子层沉积稀土氧化物和硅酸盐纳米复合薄膜硅基MOS电致发光器件的研究

国家自然科学基金

0+阅读 · 2012年12月31日

川滇地区地震动估计及烈度速判新方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于聚苯胺/二氧化锡纳米复合材料的电子标签式乙烯气体传感器研究

国家自然科学基金

0+阅读 · 2012年12月31日

理论模拟含氢键二次有机气溶胶的振动光谱

国家自然科学基金

0+阅读 · 2012年12月31日

Navier-Stokes方程的三角形cut-cell自适应有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

超短脉冲激光整形光谱物理及应用

国家自然科学基金

0+阅读 · 2011年12月31日

光晶格中超冷原子的相变和动力学研究

国家自然科学基金

0+阅读 · 2009年12月31日

场论与粒子物理中的量子纠缠与退相干

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员