Homomorphic encryption (HE) enables computations on encrypted data by concealing information under noise for security. However, the process of bootstrapping, which resets the noise level in the ciphertext, is computationally expensive and requires a large bootstrapping key. The TFHE scheme offers a faster and programmable bootstrapping algorithm called PBS, crucial for security-focused applications like machine learning. Nevertheless, the current TFHE scheme lacks support for ciphertext packing, resulting in low throughput. This work thoroughly analyzes TFHE bootstrapping, identifies the bottleneck in GPUs caused by the blind rotation fragmentation problem, and proposes a hardware TFHE accelerator called Strix. Strix introduces a two-level batching approach to enhance the batch size in PBS, utilizes a specialized microarchitecture for efficient streaming data processing, and incorporates a fully-pipelined FFT microarchitecture to improve performance. It achieves significantly higher throughput than state-of-the-art implementations on both CPUs and GPUs, outperforming existing TFHE accelerators by a factor of 7.4.
翻译:暂无翻译