加速十进制乘法 (Speeding up decimal multiplication)

Decimal multiplication is the task of multiplying two numbers in base $10^N.$ Specifically, we focus on the number-theoretic transform (NTT) family of algorithms. Using only portable techniques, we achieve a 3x---5x speedup over the mpdecimal library. In this paper we describe our implementation and discuss further possible optimizations. We also present a simple cache-efficient algorithm for in-place $2n \times n$ or $n \times 2n$ matrix transposition, the need for which arises in the "six-step algorithm" variation of the matrix Fourier algorithm, and which does not seem to be widely known. Another finding is that use of two prime moduli instead of three makes sense even considering the worst case of increasing the size of the input, and makes for simpler answer recovery.

翻译：十进制乘法是将两个数字乘以 $10 元基数。具体地说, 我们专注于算法的数字理论变换( NTT) 组。我们仅使用便携式技术, 就能在微小库中实现3x-5x加速。在本文中, 我们描述我们的执行情况, 并讨论进一步可能的优化。我们还为本地的 $ 2 元或 $ 2 元的矩阵变换提供了一个简单的缓存效率算法, 其需要来自 Fourier 矩阵的“ 六步算法” 变换, 并且似乎还不是广为人知。另一个发现是, 使用两种正模( 3) 是合理的, 甚至考虑到增加输入规模的最糟糕的情况, 并且更简单地恢复答案。