As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally, big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inference directly on the client's encrypted data. While HE can meet privacy constraints, it introduces enormous computational challenges and remains impractically slow in current systems. This paper introduces Cheetah, a set of algorithmic and hardware optimizations for HE DNN inference to achieve plaintext DNN inference speeds. Cheetah proposes HE-parameter tuning optimization and operator scheduling optimizations, which together deliver 79x speedup over the state-of-the-art. However, this still falls short of plaintext inference speeds by almost four orders of magnitude. To bridge the remaining performance gap, Cheetah further proposes an accelerator architecture that, when combined with the algorithmic optimizations, approaches plaintext DNN inference speeds. We evaluate several common neural network models (e.g., ResNet50, VGG16, and AlexNet) and show that plaintext-level HE inference for each is feasible with a custom accelerator consuming 30W and 545mm^2.
翻译:随着深层次学习的应用继续增长,用于预测的数据数量也在继续增长。虽然在传统上,大数据深层次学习受到计算性能和离芯内存带宽度的限制,但出现了一个新的制约因素:隐私。一个解决办法是同质加密(HE)。将He应用到客户库模型,使云层服务能够直接对客户加密数据进行推断。虽然HE可以满足隐私限制,但它带来了巨大的计算挑战,并且在当前系统中仍然不切实际地缓慢。本文介绍了Cheetah,这是一套HE DNN的算法和硬件优化,以达到平文本 DNN的推断速度。Cheetah建议HE-parater调整优化和操作员调度优化,共同提供79x速度,高于最新版。然而,这仍然比普通的推断速度少了近四级。为了缩小其余的性能差距,Cheetaher进一步提出一个加速器结构,在结合算法优化、接近平文本 DNNNNN的推断速度的同时,我们评估了HE-50网络的每个通用模型和平流层AVIF。