根据使用 AVX2 的多成浮点算法加速多重精密矩阵乘法 (Acceleration of multiple precision matrix multiplication based on multi-component floating-point arithmetic using AVX2)

In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision matrix multiplication with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed by certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four binary64 numbers on x86_64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications have been accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies the performance of parallelization with OpenMP.

翻译：在本文中,我们报告了与AVX2加速多二进制64型多精密矩阵乘法的结果。我们的目标为双倍(DD)、三倍(TD)和四倍(QD)精确算术,由某些类型的无误转换(EFT)算术设计。此外,我们实施了SIMD化 EFT函数,该函数在x86_64计算环境中与四个二进制64数字同时计算,并且通过利用这些函数,我们还开发了SIMD化的DD、TD和QD附加和乘法。此外,我们采用了AVX2负载/存储功能,以便有效地加速读取和存储从/存储矩阵元素到内存。由于这些综合技术,我们实施的多精密矩阵乘法比非加速的倍增法加速了三倍多。我们加速的矩阵倍增法改变了与OpenMP平行的性能。