CXL-based Computational Memory (CCM) enables near-memory processing within expanded remote memory, presenting opportunities to address data movement costs associated with disaggregated memory systems and to accelerate overall performance. However, existing operation offloading mechanisms are not capable of leveraging the trade-offs of different models based on different CXL protocols. This work first examines these tradeoffs and demonstrates their impact on end-to-end performance and system efficiency for workloads with diverse data and processing requirements. We propose a novel 'Asynchronous Back-Streaming' protocol by carefully layering data and control transfer operations on top of the underlying CXL protocols. We design KAI, a system that realizes the asynchronous back-streaming model that supports asynchronous data movement and lightweight pipelining in host-CCM interactions. Overall, KAI reduces end-to-end runtime by up to 50.4%, and CCM and host idle times by average 22.11x and 3.85x, respectively.
翻译:基于CXL的计算内存(CCM)能够在扩展的远程内存中实现近内存处理,为解决解耦内存系统中的数据移动成本并提升整体性能提供了机遇。然而,现有的操作卸载机制无法有效利用基于不同CXL协议的各种模型之间的权衡。本研究首先分析了这些权衡,并展示了其对具有多样化数据和处理需求的工作负载在端到端性能和系统效率方面的影响。我们提出了一种新颖的'异步回传流'协议,通过在底层CXL协议之上精心分层数据和控制传输操作来实现。我们设计了KAI系统,该系统实现了异步回传流模型,支持主机与CCM交互中的异步数据移动和轻量级流水线处理。总体而言,KAI将端到端运行时间最多降低了50.4%,并将CCM和主机的空闲时间平均分别减少了22.11倍和3.85倍。