Privacy-preserving neural network (NN) inference solutions have recently gained significant traction with several solutions that provide different latency-bandwidth trade-offs. Of these, many rely on homomorphic encryption (HE), a method of performing computations over encrypted data. However, HE operations even with state-of-the-art schemes are still considerably slow compared to their plaintext counterparts. Pruning the parameters of a NN model is a well-known approach to improving inference latency. However, pruning methods that are useful in the plaintext context may lend nearly negligible improvement in the HE case, as has also been demonstrated in recent work. In this work, we propose a novel set of pruning methods that reduce the latency and memory requirement, thus bringing the effectiveness of plaintext pruning methods to HE. Crucially, our proposal employs two key techniques, viz. permutation and expansion of the packed model weights, that enable pruning significantly more ciphertexts and recuperating most of the accuracy loss, respectively. We demonstrate the advantage of our method on fully connected layers where the weights are packed using a recently proposed packing technique called tile tensors, which allows executing deep NN inference in a non-interactive mode. We evaluate our methods on various autoencoder architectures and demonstrate that for a small mean-square reconstruction loss of 1.5*10^{-5} on MNIST, we reduce the memory requirement and latency of HE-enabled inference by 60%.
翻译:隐私保存神经网络(NN)的推断解决方案最近获得了显著的牵引力,一些解决方案提供了不同的延缓带宽宽度交易。 其中很多人依赖对加密数据进行计算的一种方法,即同质加密(HE),但是,即使采用最先进的方法,其操作也仍然相当缓慢。稳住NNN模型的参数是改善推导延度的众所周知的方法。然而,在平文本背景下有用的调整方法可能给HE案例带来几乎微不足道的改进,正如最近的工作所显示的那样。在这项工作中,我们提出了一套新型的修剪方法,以减少对加密和记忆要求的计算方法。然而,即使采用最先进的方法,也仍然相当缓慢。我们的提案采用了两种关键技术,即,即宽度模型重量的变异和扩展,从而使得更精确度的调整方法大大地增加。我们提出的精度损失率要求可能几乎微不足道,正如最近的工作所证明的那样。我们提出的一套方法在深度变深的内压层中降低了我们的方法的优点,在深度的内压结构中,我们提出的内压式的内压式的内压式结构中,我们用了一种叫做“变压式式的内压式的内压式的内压式的内压。