矢量处理器的可调适登记册文件组织 (Adaptable Register File Organization for Vector Processors)

Cristóbal Ramírez Lazo,Enrico Reggiani,Carlos Rojas Morales,Roger Figueras Bagué,Luis Alfonso Villa Vargas,Marco Antonio Ramírez Salinas,Mateo Valero Cortés,Osman Sabri Unsal,Adrián Cristal

from arxiv, 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

Modern scientific applications are getting more diverse, and the vector lengths in those applications vary widely. Contemporary Vector Processors (VPs) are designed either for short vector lengths, e.g., Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, e.g., NEC Aurora Tsubasa with 16Kbits Maximum Vector Length (MVL). Unfortunately, both approaches have drawbacks. On the one hand, short vector length VP designs struggle to provide high efficiency for applications featuring long vectors with high Data Level Parallelism (DLP). On the other hand, long vector VP designs waste resources and underutilize the Vector Register File (VRF) when executing low DLP applications with short vector lengths. Therefore, those long vector VP implementations are limited to a specialized subset of applications, where relatively high DLP must be present to achieve excellent performance with high efficiency. To overcome these limitations, we propose an Adaptable Vector Architecture (AVA) that leads to having the best of both worlds. AVA is designed for short vectors (MVL=16 elements) and is thus area and energy-efficient. However, AVA has the functionality to reconfigure the MVL, thereby allowing to exploit the benefits of having a longer vector (up to 128 elements) microarchitecture when abundant DLP is present. We model AVA on the gem5 simulator and evaluate the performance with six applications taken from the RiVEC Benchmark Suite. To obtain area and power consumption metrics, we model AVA on McPAT for 22nm technology. Our results show that by reconfiguring our small VRF (8KB) plus our novel issue queue scheme, AVA yields a 2X speedup over the default configuration for short vectors. Additionally, AVA shows competitive performance when compared to a long vector VP, while saving 50% of area.

翻译：现代科学应用日益多样化,这些应用中的矢量长度也大相径庭。当代矢量处理器(VP)设计为短矢量长度,如Fujitsu A64FX,512比特 ARM SVE矢量支持,或长矢量,如NEC Aurora Tsubasa, 16比特最大矢量长度(MVLL)。不幸的是,这两种方法都有缺陷。一方面,短矢量驱动器设计为具有高数据平行度(DLP)的长矢量处理器提供高效率。另一方面,长矢量驱动器设计为2比短矢量的矢量处理器(VP)设计废物资源,在使用低矢量登记器应用程序(VRRF)时,没有充分利用矢量登记器(VA5比特),这些长期矢量驱动器的运行仅限于一个专门组,要达到高效性能。为了克服这些限制,我们提议一个可调整的矢量新结构架构(AVAVA),我们从小世界最优化的矢量动力电量应用应用中,AVAVA的流流流流流数据区域有6比值(MVL),我们变变换机能区域, 显示AVIVL值速度值的能量值区域。因此,我们变变的能量值显示为RVVVA-RVA-RVA-RVA-RRF值速度值速度值区域。