Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.
翻译:利用矢量化技术,CPU同时对多个数据元素应用操作的能力对于高性能工作负载至关重要。然而,目前商用的提供RISC-V矢量扩展(RVV)的物理硬件仅支持版本0.7.1,与最新批准的版本1.0不兼容。问题在于,上游编译器工具链(如Clang)只针对批准的v1.0进行编译,不支持旧的v0.7.1。由于v1.0与v0.7.1不兼容,编写矢量化代码的唯一方法是使用供应商提供的旧编译器。在本文中,我们介绍了rvv-rollback工具,该工具将使用矢量扩展v1.0指令生成的汇编代码转换为v0.7.1。我们利用该工具比较了供应商提供的GNU 8.4编译器(支持v0.7.1)和LLVM 15.0(只支持v1.0)的矢量化性能,发现LLVM编译器能够自动矢量化更多的计算核,大多数情况下比GNU表现更好,但并非所有情况都是如此。我们还测试了具有矢量长度通用和特定设置的LLVM矢量化,观察到存在性能显著差异的情况。