Heterogeneous chiplet-based systems improve scaling by disag-gregating CPUs/GPUs and emerging technologies (HBM/DRAM).However this on-package disaggregation introduces a latency inNetwork-on-Interposer(NoI). We observe that in modern large-modelinference, parameters and activations routinely move backand forth from HBM/DRAM, injecting large, bursty flows into theinterposer. These memory-driven transfers inflate tail latency andviolate Service Level Agreements (SLAs) across k-ary n-cube base-line NoI topologies. To address this gap we introduce an InterferenceScore (IS) that quantifies worst-case slowdown under contention.We then formulate NoI synthesis as a multi-objective optimization(MOO) problem. We develop PARL (Partition-Aware ReinforcementLearner), a topology generator that balances throughput, latency,and power. PARL-generated topologies reduce contention at the memory cut, meet SLAs, and cut worst-case slowdown to 1.2 times while maintaining competitive mean throughput relative to link-rich meshes. Overall, this reframes NoI design for heterogeneouschiplet accelerators with workload-aware objectives.
翻译:异构芯粒系统通过解耦CPU/GPU与新兴存储技术(HBM/DRAM)实现了规模扩展,但这种封装内解耦引入了中介层网络(NoI)的延迟问题。我们观察到,在现代大模型推理过程中,参数和激活值频繁在HBM/DRAM间往返传输,向中介层注入大量突发流。这些内存驱动的传输会加剧k元n立方基准NoI拓扑的尾部延迟,并违反服务等级协议(SLA)。为解决这一问题,我们提出干扰评分(IS)指标以量化竞争场景下的最差减速程度,并将NoI综合建模为多目标优化(MOO)问题。我们开发了PARL(分区感知强化学习器)——一种能平衡吞吐量、延迟与功耗的拓扑生成器。PARL生成的拓扑在内存分割处减少竞争,满足SLA要求,并将最差减速控制在基准的1.2倍以内,同时保持与高密度连接网格相当的均值吞吐量。总体而言,本研究以工作负载感知为目标重构了异构芯粒加速器的NoI设计范式。