Homomorphic Encryption (HE) is an emerging encryption scheme that allows computations to be performed directly on encrypted messages. This property provides promising applications such as privacy-preserving deep learning and cloud computing. Prior works have been proposed to enable practical privacy-preserving applications with architectural-aware optimizations on CPUs, GPUs and FPGAs. However, there is no systematic optimization for the whole HE pipeline on Intel GPUs. In this paper, we present the first-ever SYCL-based GPU backend for Microsoft SEAL APIs. We perform optimizations from instruction level, algorithmic level and application level to accelerate our HE library based on the Cheon, Kim, Kimand Song (CKKS) scheme on Intel GPUs. The performance is validated on two latest Intel GPUs. Experimental results show that our staged optimizations together with optimizations including low-level optimizations and kernel fusion accelerate the Number Theoretic Transform (NTT), a key algorithm for HE, by up to 9.93X compared with the na\"ive GPU baseline. The roofline analysis confirms that our optimized NTT reaches 79.8% and85.7% of the peak performance on two GPU devices. Through the highly optimized NTT and the assembly-level optimization, we obtain 2.32X - 3.05X acceleration for HE evaluation routines. In addition, our all-together systematic optimizations improve the performance of encrypted element-wise polynomial matrix multiplication application by up to 3.10X.
翻译:基因加密( HH) 是一个新兴的加密方案, 允许在加密信件上直接进行计算。 此属性提供了有希望的应用, 如隐私保存深层学习和云计算。 先前的工程已经提出, 以便在CPU、 GPUs 和 FPGAs 上实现建筑智能优化, 以在 CPU、 GPUs 和 FPGAs 上实现实际的隐私保护应用程序。 但是, 在 Intel GPUs 上没有系统优化整个 HE 管道。 在本文中, 我们为 Microsoft SEAL API 提供了有史以来第一个基于 SYCL 的 GPU 后端。 我们从指令级别、 算法级别和应用程序级别上进行优化, 以加快我们基于 Cheon, Kim, Kim and Song Song( CKKS) 的 HE 图书馆。 最新版本分析显示, 我们的阶段优化和优化, 包括低级别优化和内气态变换数字( NTTT), 通过 N993X 与 NPU 3. 最高级测试, 最高性优化的性测试, 达到我们最高级的绩效。