Fully Homomorphic Encryption (FHE) allows arbitrarily complex computations on encrypted data without ever needing to decrypt it, thus enabling us to maintain data privacy on third-party systems. Unfortunately, sustaining deep computations with FHE requires a periodic noise reduction step known as bootstrapping. The cost of the bootstrapping operation is one of the primary barriers to the wide-spread adoption of FHE. In this paper, we present an in-depth architectural analysis of the bootstrapping step in FHE. First, we observe that secure implementations of bootstrapping exhibit a low arithmetic intensity (<1 Op/byte), require large caches (>100 MB), and are heavily bound by the main memory bandwidth. Consequently, we demonstrate that existing workloads observe marginal performance gains from the design of bespoke high-throughput arithmetic units tailored to FHE. Second, we propose several cache-friendly algorithmic optimizations that improve the throughput in FHE bootstrapping by enabling up to 3.2x higher arithmetic intensity and 4.6x lower memory bandwidth. Our optimizations apply to a wide range of structurally similar computations such as private evaluation and training of machine learning models. Finally, we incorporate these optimizations into an architectural tool which, given a cache size, memory subsystem, the number of functional units and a desired security level, selects optimal cryptosystem parameters to maximize the bootstrapping throughput. Our optimized bootstrapping implementation represents a best-case scenario for compute acceleration of FHE. We show that despite these optimizations, bootstrapping continues to be bottlenecked by main memory bandwidth. We propose new research directions to address the underlying memory bottleneck. In summary, our answer to the titular question is: yes, but only after addressing the memory bottleneck!
翻译:完全基因加密( FHE) 允许对加密数据进行任意复杂的计算,而无需对其进行解密,从而使我们能够在第三方系统中保持数据隐私。 不幸的是, 与 FHE 保持深度计算需要一个定期的减少噪音步骤, 称为靴式。 靴式操作的成本是广泛采用FHE的主要障碍之一。 在本文中, 我们对FHE 的靴式步骤进行深入的建筑分析。 首先, 我们观察到, 安全地进行靴式穿梭的操作显示出一种低算术强度( < 1 Op/byte), 需要大量的缓存( > 100 MB), 并且受主要记忆带带宽的束缚。 因此, 我们证明, 现有的工作量观察着从设计为 FHEE 设计的高通量计算器中获得的边际性工作成绩。 其次, 我们提出一些方便于缓冲式的算法式优化, 使FHEHE 靴式的铺路速度达到3. 2x 更高; 缩略式的缩略图显示。 我们的优化适用于一系列结构相似的计算方法, 如私人评估和精密级的精度精细的精度, 显示一个最精细的内存储的内, 我们的内存到最精度的机的内最精度, 最后将这些机的缩化的缩化的缩化的缩缩缩略式的缩略式的缩略式的内。