FHE offers protection to private data on third-party cloud servers by allowing computations on the data in encrypted form. However, to support general-purpose encrypted computations, all existing FHE schemes require an expensive operation known as bootstrapping. Unfortunately, the computation cost and the memory bandwidth required for bootstrapping add significant overhead to FHE-based computations, limiting the practical use of FHE. In this work, we propose FAB, an FPGA-based accelerator for bootstrappable FHE. Prior FPGA-based FHE accelerators have proposed hardware acceleration of basic FHE primitives for impractical parameter sets without support for bootstrapping. FAB, for the first time ever, accelerates bootstrapping (along with basic FHE primitives) on an FPGA for a secure and practical parameter set. The key contribution of our work is to architect a balanced FAB design, which is not memory bound. To this end, we leverage recent algorithms for bootstrapping while being cognizant of the compute and memory constraints of our FPGA. We use a minimal number of functional units for computing, operate at a low frequency, leverage high data rates to and from main memory, utilize the limited on-chip memory effectively, and perform operation scheduling carefully. For bootstrapping a fully-packed ciphertext, while operating at 300 MHz, FAB outperforms existing state-of-the-art CPU and GPU implementations by 213x and 1.5x respectively. Our target FHE application is training a logistic regression model over encrypted data. For logistic regression model training scaled to 8 FPGAs on the cloud, FAB outperforms a CPU and GPU by 456x and 6.5x and provides competitive performance when compared to the state-of-the-art ASIC design at a fraction of the cost.
翻译:FHE为第三方云服务器的私人数据提供保护,允许计算加密数据。然而,为支持通用加密计算,所有现有的FHE计划都需要昂贵的“靴子”操作。 不幸的是,靴子所需的计算成本和内存带带宽为基于FHE的计算增加了大量的间接费用,限制了FHE的实际使用。在这项工作中,我们提议FAB(基于FBGA的PFGA加速器)用于可靴子式FHE。之前,FFGA的FHE加速器提议,为不切实际的参数组加速FHE原始基本硬件,而无需支持靴子。FAB(与FHE Frish Frish)首次加速了在基于FHE的计算中(与基本的FHE原始部分一起)所需的靴子管理权重,限制了FSDU的平衡性设计,这与记忆捆绑在一起。我们利用了最近的一些运算算法,同时认识到我们FHEFGA的计算和记忆限制。我们利用了最低限度的精度-SLA的精度应用功能-SLA的精度,同时使用少量的精度的精度的精度, 将精度的精度的精度-级的精度-级的精度-级的精度-级的精度的精度的精度-级的精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-精度-感