Homomorphic Encryption (HE), allowing computations on encrypted data (ciphertext) without decrypting it first, enables secure but prohibitively slow Convolutional Neural Network (CNN) inference for privacy-preserving applications in clouds. To reduce the inference latency, one approach is to pack multiple messages into a single ciphertext in order to reduce the number of ciphertexts and support massive parallelism of Homomorphic Multiply-Accumulate (HMA) operations between ciphertexts. Despite the faster HECNN inference, the mainstream packing schemes Dense Packing (DensePack) and Convolution Packing (ConvPack) introduce expensive rotation overhead, which prolongs the inference latency of HECNN for deeper and wider CNN architectures. In this paper, we propose a low-rank factorization method named FFConv dedicated to efficient ciphertext packing for reducing both the rotation overhead and HMA operations. FFConv approximates a d x d convolution layer with low-rank factorized convolutions, in which a d x d low-rank convolution with fewer channels is followed by a 1 x 1 convolution to restore the channels. The d x d low-rank convolution with DensePack leads to significantly reduced rotation operations, while the rotation overhead of 1 x 1 convolution with ConvPack is close to zero. To our knowledge, FFConv is the first work that is capable of reducing the rotation overhead incurred by DensePack and ConvPack simultaneously, without introducing additional special blocks into the HECNN inference pipeline. Compared to prior art LoLa and Falcon, our method reduces the inference latency by up to 88% and 21%, respectively, with comparable accuracy on MNIST and CIFAR-10.
翻译:基因加密( HH), 允许在不首先解密的情况下计算加密数据( 密码), 从而可以在不首先解密的情况下对加密数据( 密码) 进行计算, 能够对云层中的隐私保护应用进行安全但令人望而却步的动态神经网络( CNN) 推算。 为了降低推断延迟度, 一种方法是将多条信息装入一个单一的密码文本中, 以减少密码数, 支持在密码文本之间大量平行的多层( HMA) 操作。 尽管 HCNN 推断速度较快, 主流包装计划( DensePack ) 和 Convolution 包装( Convolution), 引入了昂贵的神经包包包包包( DonPack ), 引入了昂贵的自动折叠式自动折叠式自动折叠式自动折叠式自动折叠式自动转换式自动折叠式自动递转式自动递转式自动递减。