Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme supporting fully-packed bootstrapping - the noise removal capability necessary to support arbitrary-depth computation. We propose a generalized version of bootstrapping that can be implemented directly in our hardware, instantiated with Montgomery multipliers that save 46% in silicon area and 40% in power consumption compared to traditional approaches. BASALISC's four-layer memory hierarchy includes a two-dimensional conflict-free inner memory layer that enables 32 Tb/s radix-256 NTT computations without pipeline stalls. Our conflict-resolution permutation hardware is generalized and re-used to compute BGV automorphisms without throughput penalty. BASALISC also has a custom multiply-accumulate unit to accelerate BGV key switching. Both BASALISC's computation units and inner memory layers are designed in asynchronous logic, allowing them to run at different speeds to optimize each function. To evaluate BASALISC, we study its physical realizability, emulate and formally verify its core functional units, and we study its performance on a set of benchmarks. First, we evaluate a single iteration of logistic regression training over encrypted data - an application that translates to 513 bootstraps, 900K high-level, or 27B low-level BASALISC instructions - showing that BASALISC is only 3,500 times slower than an Intel Xeon-class processor running without data encryption. We also run an individual bootstrapping operation, for which we show a speedup of 4,000 times over HElib - a popular software FHE library.
翻译:完全自制加密( FHE) 能够安全计算加密数据 。 我们展示了 BASALISSC 的硬件加速器结构组, 这个结构组是一个硬件加速器, 旨在大大加速云层中的 FHE 计算 。 BASALISSC 是第一个实施 BGV 计划, 支持全包装的靴式安装 — 支持任意深度计算所需的噪音清除能力 。 我们提出一个通用的靴式版本, 可以直接在我们硬件中实施, 由蒙哥马利的增殖器即时操作, 与传统方法相比, 它在硅区域中节省了46% 和40% 的电源消耗 。 BASALIS 的四层内存结构包括一个两维的无冲突内部内存层, 使32 TB/sradexyx-256 NTT 计算器能够在没有管道摊位的情况下实施全包装的靴式靴式靴式制, 我们的冲突解析了BSALIS的双轨运行系统运行系统, 将它的一个核心的运行到一个常规的轨道运行系统。