Memory load/store instructions consume an important part in execution time and energy consumption in domain-specific accelerators. For designing highly parallel systems, available parallelism at each granularity is extracted from the workloads. The maximal use of parallelism at each granularity in these high-performance designs requires the utilization of multi-port memories. Currently, true multiport designs are less popular because there is no inherent EDA support for multiport memory beyond 2-ports, utilizing more ports requires circuit-level implementation and hence a high design time. In this work, we present a framework for Design Space Exploration of Algorithmic Multi-Port Memories (AMM) in ASICs. We study different AMM designs in the literature, discuss how we incorporate them in the Pre-RTL Aladdin Framework with different memory depth, port configurations and banking structures. From our analysis on selected applications from the MachSuite (accelerator benchmark suite), we understand and quantify the potential use of AMMs (as true multiport memories) for high performance in applications with low spatial locality in memory access patterns.
翻译:在设计高度平行的系统时,从工作量中提取每个颗粒的平行功能。在这些高性能设计中,对每个颗粒的最大平行作用要求利用多端记忆。目前,真正的多港口设计不太受欢迎,因为除了2个港口之外,对多端港口记忆没有固有的EDA支持,使用更多的港口需要电路级执行,因此设计时间也很高。在这项工作中,我们提出了一个在ACIC中进行阿尔哥里米多端波段空间设计探索的框架。我们在文献中研究不同的AMM设计,讨论我们如何以不同的记忆深度、港口配置和银行结构将这些设计纳入RTL Aladdin框架。我们从对MachSeite(加速基准套件)的某些应用的分析中,了解并量化AMM(作为真正的多端记忆)在记忆访问模式中低空间应用中的高性能应用的潜力。