PIM 还是 CXL-PIM？通过大规模基准测试理解架构权衡 (PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking) - 专知论文

会员服务 ·

0

内存 · IM · 基准 · 基准测试 · 传输 ·

PIM or CXL-PIM? Understanding Architectural Trade-offs Through Large-Scale Benchmarking

翻译：PIM 还是 CXL-PIM？通过大规模基准测试理解架构权衡

I-Ting Lee,Bao-Kai Wang,Liang-Chi Chen,Wen Sheng Lim,Da-Wei Chang,Yu-Ming Chang,Chieng-Chung Ho

Processing-in-memory (PIM) reduces data movement by executing near memory, but our large-scale characterization on real PIM hardware shows that end-to-end performance is often limited by disjoint host and device address spaces that force explicit staging transfers. In contrast, CXL-PIM provides a unified address space and cache-coherent access at the cost of higher access latency. These opposing interface models create workload-dependent tradeoffs that are not captured by small-scale studies. This work presents a side-by-side, large-scale comparison of PIM and CXL-PIM using measurements from real PIM hardware and trace-driven CXL modeling. We identify when unified-address access amortizes link latency enough to overcome transfer bottlenecks, and when tightly coupled PIM remains preferable. Our results reveal phase- and dataset-size regimes in which the relative ranking between the two architectures reverses, offering practical guidance for future near-memory system design.

翻译：内存内处理（PIM）通过在内存附近执行计算来减少数据移动，但我们在真实 PIM 硬件上的大规模特性分析表明，端到端性能常受限于主机与设备地址空间分离所导致的显式暂存传输。相比之下，CXL-PIM 提供统一地址空间和缓存一致性访问，但代价是更高的访问延迟。这两种对立的接口模型产生了依赖工作负载的权衡，而小规模研究未能充分捕捉。本研究基于真实 PIM 硬件测量和基于追踪的 CXL 建模，对 PIM 和 CXL-PIM 进行了并行大规模比较。我们明确了统一地址访问何时能充分分摊链路延迟以克服传输瓶颈，以及紧密耦合的 PIM 何时仍更具优势。研究结果揭示了两种架构相对性能排序发生逆转的阶段和数据集规模区间，为未来近内存系统设计提供了实用指导。

0

相关内容

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

专知会员服务

15+阅读 · 2022年3月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知

38+阅读 · 2020年9月30日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于编译的PCM内存损耗均衡方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动态匹配的高能量利用率多层堆叠结构静态随机存储器（SRAM）关键技术

国家自然科学基金

0+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

Arxiv

0+阅读 · 12月17日

Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models

Arxiv

0+阅读 · 12月16日

Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens

Arxiv

0+阅读 · 12月10日

Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management

Arxiv

0+阅读 · 11月27日

Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management

Arxiv

0+阅读 · 11月25日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

【ACL2022】一个用于远距监督关系抽取的层级对比学习框架, HiCLRE: A Hierarchical Contrastive Learning Framework for Distantly Supervised Relation Extraction

专知会员服务

15+阅读 · 2022年3月24日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【ICML2021】图对比学习自动化

专知会员服务

41+阅读 · 2021年6月19日

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

【香港中文大学-VLDB2020】Dash:可扩展的持久内存哈希，Scalable Hashing

专知会员服务

25+阅读 · 2020年3月17日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

【NeurIPS2020-MIT】子图神经网络，Subgraph Neural Networks

专知

38+阅读 · 2020年9月30日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

相关论文

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

Arxiv

0+阅读 · 12月17日

Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models

Arxiv

0+阅读 · 12月16日

Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens

Arxiv

0+阅读 · 12月10日

Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management

Arxiv

0+阅读 · 11月27日

Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management

Arxiv

0+阅读 · 11月25日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于编译的PCM内存损耗均衡方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动态匹配的高能量利用率多层堆叠结构静态随机存储器（SRAM）关键技术

国家自然科学基金

0+阅读 · 2015年12月31日

基于决策模型和预备电位的运动想象BCI研究

国家自然科学基金

3+阅读 · 2015年12月31日

语义关联的地理视频数据自适应组织方法

国家自然科学基金

1+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员