In this paper, we propose TensorFHE, an FHE acceleration solution based on GPGPU for real applications on encrypted data. TensorFHE utilizes Tensor Core Units (TCUs) to boost the computation of Number Theoretic Transform (NTT), which is the part of FHE with highest time-cost. Moreover, TensorFHE focuses on performing as many FHE operations as possible in a certain time period rather than reducing the latency of one operation. Based on such an idea, TensorFHE introduces operation-level batching to fully utilize the data parallelism in GPGPU. We experimentally prove that it is possible to achieve comparable performance with GPGPU as with state-of-the-art ASIC accelerators. TensorFHE performs 913 KOPS and 88 KOPS for NTT and HMULT (key FHE kernels) within NVIDIA A100 GPGPU, which is 2.61x faster than state-of-the-art FHE implementation on GPGPU; Moreover, TensorFHE provides comparable performance to the ASIC FHE accelerators, which makes it even 2.9x faster than the F1+ with a specific workload. Such a pure software acceleration based on commercial hardware with high performance can open up usage of state-of-the-art FHE algorithms for a broad set of applications in real systems.
翻译:在本文中,我们提议TensorFHE, 这是一种基于 GPGPPU 的FHE加速解决方案, 以 GPGPPU 中的数据平行应用为基础。 TonsorFHE 使用 Tansor核心单位( TCUs) 来提高数字理论变换( NTT) 的计算。 这是FHE 的一部分, 时间成本最高。 此外, TonsorFHE 侧重于在一定时期内尽可能多地执行 FHE 操作, 而不是减少一个操作的延迟性能。 基于这样一个想法, TensorFHE 引入操作级分批, 以充分利用 GPGPGPPU 中的数据平行性。 我们实验性地证明, 与 GPGPGPPPP 的核心单位( TunsorFHE) 相比, 可以实现与 GPGPGPGPGPPP 的可比的类似性能。 TensorFHEHE 的高级性能比FSA+FHE的高级性能, 和高额级的高级性能。