加速数字理论变换, 用于在 GPUs 上进行可应用的自动自动加密 (Accelerating Number Theoretic Transformations for Bootstrappable Homomorphic Encryption on GPUs)

Homomorphic encryption (HE) draws huge attention as it provides a way of privacy-preserving computations on encrypted messages. Number Theoretic Transform (NTT), a specialized form of Discrete Fourier Transform (DFT) in the finite field of integers, is the key algorithm that enables fast computation on encrypted ciphertexts in HE. Prior works have accelerated NTT and its inverse transformation on a popular parallel processing platform, GPU, by leveraging DFT optimization techniques. However, these GPU-based studies lack a comprehensive analysis of the primary differences between NTT and DFT or only consider small HE parameters that have tight constraints in the number of arithmetic operations that can be performed without decryption. In this paper, we analyze the algorithmic characteristics of NTT and DFT and assess the performance of NTT when we apply the optimizations that are commonly applicable to both DFT and NTT on modern GPUs. From the analysis, we identify that NTT suffers from severe main-memory bandwidth bottleneck on large HE parameter sets. To tackle the main-memory bandwidth issue, we propose a novel NTT-specific on-the-fly root generation scheme dubbed on-the-fly twiddling (OT). Compared to the baseline radix-2 NTT implementation, after applying all the optimizations, including OT, we achieve 4.2x speedup on a modern GPU.

翻译：基因加密( HH) 引起极大注意, 因为它为加密信件提供了一种隐私保存计算方法。数字理论变换( NTT) 是一种在有限整数字段中分解 Fleier 变换( DFT) 的专门形式, 是能够快速计算高超加密密码文本的关键算法。先前的工作通过利用 DFT 优化技术, 加速了NTT 及其在流行的平行处理平台 GPU 上的反向变换。然而, 这些基于 GPU 的研究缺乏对 NTT 和 DFT 之间主要差异的综合分析, 或只考虑在不解密情况下可以进行计算操作的数量有严格限制的小 HE参数。在本文中, 我们分析NTT和 DFT的算法特性, 并在我们应用现代GPUP通常适用于 DFT 和 NTTT 的优化时评估NTT的性能。我们从分析中发现, NTTT在大型 HE 参数组上存在严重主模带带带带宽的瓶瓶。要处理主带宽带宽带宽的问题, 要解决主带宽带带宽问题, 我们提议在新NTTTTTTTTF-2 底基底底线上执行NTBLOBLU- broft- 底基底基底线上, 上, 我们底基底基底线。