Fully homomorphic encryption (FHE) schemes like RNS-CKKS enable privacy-preserving outsourced computation (PPOC) but suffer from high computational latency and ciphertext expansion, especially on the resource-constrained edge side. Hybrid Homomorphic Encryption (HHE) mitigates these issues on the edge side by replacing HE with lightweight symmetric encryption for plaintext encryption, such as the Rubato cipher for the HHE variant of RNS-CKKS, yet it introduces transciphering overhead on the cloud. The respective strengths and limitations of FHE and HHE call for a dual-mode HHE solution with flexible algorithm switching ability. This paper presents DNA-HHE, the first dual-mode HHE accelerator with near-network coupling for edge devices. DNA-HHE supports both edge-side RNS-CKKS and Rubato within a unified architecture driven by flexible custom instructions. To realize a compact implementation for the edge side, we propose a DSP-efficient modular reduction design, a compact multi-field-adaptive butterfly unit, and parallel scheduling schemes of Rubato with a high degree of resource sharing. DNA-HHE is designed with network protocol packaging and transmission capacities and directly coupled to the network interface controller, achieving reduced overall latency of edge-side PPOC by 1.09$\times$ to 1.56$\times$. Our evaluations on the ASIC and FPGA platforms demonstrate that DNA-HHE outperforms the state-of-the-art single-mode designs in both edge-side RNS-CKKS and symmetric cipher with better computation latency and area efficiency, while offering dual-mode functionality.
翻译:全同态加密(FHE)方案(如RNS-CKKS)能够实现隐私保护的委托计算,但其计算延迟和密文膨胀问题严重,尤其在资源受限的边缘侧。混合同态加密(HHE)通过在边缘侧使用轻量级对称加密(如RNS-CKKS的HHE变体所采用的Rubato密码)替代同态加密来缓解这些问题,但会在云端引入密文转换开销。FHE与HHE各自的特点要求一种具备灵活算法切换能力的双模HHE解决方案。本文提出DNA-HHE,首个面向边缘设备的、采用近网络耦合的双模HHE加速器。DNA-HHE通过灵活定制指令驱动的统一架构,同时支持边缘侧的RNS-CKKS与Rubato算法。为实现边缘侧的紧凑部署,我们提出了DSP高效模约减设计、紧凑型多域自适应蝶形单元,以及具有高度资源共享的Rubato并行调度方案。DNA-HHE设计具备网络协议封装与传输能力,并直接与网络接口控制器耦合,将边缘侧隐私保护委托计算的整体延迟降低了1.09倍至1.56倍。我们在ASIC与FPGA平台上的评估表明,DNA-HHE在边缘侧RNS-CKKS和对称密码计算方面均优于当前最先进的单模设计,在计算延迟和面积效率上表现更优,同时提供双模功能。