Fully Homomorphic Encryption (FHE) allows arbitrarily complex computations on encrypted data without ever needing to decrypt it, thus enabling us to maintain data privacy on third-party systems. Unfortunately, sustaining deep computations with FHE requires a periodic noise reduction step known as bootstrapping. The cost of the bootstrapping operation is one of the primary barriers to the wide-spread adoption of FHE. In this paper, we present an in-depth architectural analysis of the bootstrapping step in FHE. First, we observe that secure implementations of bootstrapping exhibit a low arithmetic intensity (<1 Op/byte), require large caches (>100 MB) and as such, are heavily bound by the main memory bandwidth. Consequently, we demonstrate that existing workloads observe marginal performance gains from the design of bespoke high-throughput arithmetic units tailored to FHE. Secondly, we propose several cache-friendly algorithmic optimizations that improve the throughput in FHE bootstrapping by enabling up to 3.2x higher arithmetic intensity and 4.6x lower memory bandwidth. Our optimizations apply to a wide range of structurally similar computations such as private evaluation and training of machine learning models. Finally, we incorporate these optimizations into an architectural tool which, given a cache size, memory subsystem, the number of functional units and a desired security level, selects optimal cryptosystem parameters to maximize the bootstrapping throughput. Our optimized bootstrapping implementation represents a best-case scenario for compute acceleration of FHE. We show that despite these optimizations, bootstrapping continues to remain bottlenecked by main memory bandwidth. We thus conclude that secure FHE implementations need to look beyond accelerated compute for further performance improvements and propose new research directions to address the underlying memory bottleneck.
翻译:完全基因加密(FHE)允许对加密数据进行任意复杂的计算,而无需对其进行解密,从而使我们能够在第三方系统中保持数据隐私。 不幸的是,与FHE进行深度计算需要一个定期的减少噪音步骤,称为靴式。 靴式操作的成本是广泛采用FHE的主要障碍之一。 在本文中,我们对FHE的靴式步骤进行深入的建筑分析。 首先,我们观察到,安全地进行靴式穿鞋工作表现出一种低算术强度( < 1O/byte),需要大型缓冲(>100MB),因此,与第三方系统保持数据隐私隐私隐私隐私隐私隐私隐私隐私隐私。 因此,我们表明,现有的工作量观察从设计为FHE而专门设计的高通气量计算单位的设计中获得的边际性绩效。 其次,我们提出一些方便于缓冲式算法式的算法式优化,通过调高压式计算速度和4.6x记忆带宽度的新带宽度调整。 我们的优化适用于一系列结构相似的计算方法,例如私人的精度精度精度精度精度精度精度精度精度精度缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩。