Secure multi-party computation (MPC) enables computation directly on encrypted data and protects both data and model privacy in deep learning inference. However, existing neural network architectures, including Vision Transformers (ViTs), are not designed or optimized for MPC and incur significant latency overhead. We observe Softmax accounts for the major latency bottleneck due to a high communication complexity, but can be selectively replaced or linearized without compromising the model accuracy. Hence, in this paper, we propose an MPC-friendly ViT, dubbed MPCViT, to enable accurate yet efficient ViT inference in MPC. Based on a systematic latency and accuracy evaluation of the Softmax attention and other attention variants, we propose a heterogeneous attention optimization space. We also develop a simple yet effective MPC-aware neural architecture search algorithm for fast Pareto optimization. To further boost the inference efficiency, we propose MPCViT+, to jointly optimize the Softmax attention and other network components, including GeLU, matrix multiplication, etc. With extensive experiments, we demonstrate that MPCViT achieves 1.9%, 1.3% and 4.6% higher accuracy with 6.2x, 2.9x and 1.9x latency reduction compared with baseline ViT, MPCFormer and THE-X on the Tiny-ImageNet dataset, respectively. MPCViT+ further achieves 1.2x latency reduction on CIFAR-100 dataset and reaches a better Pareto front compared with MPCViT.
翻译:安全多方计算(MPC)使深度学习推断中的计算能够直接在加密数据上进行,并保护数据和模型隐私。然而,现有的神经网络架构,包括视觉Transformer(ViT),并未针对MPC进行设计或优化,会产生重大的延迟开销。我们观察到Softmax在通信复杂度很高的情况下占据了主要的延迟瓶颈,但可以被选择性地替换或线性化而不影响模型准确性。因此,在本文中,我们提出了一种MPC友好型ViT,名为MPCViT,可在MPC中实现准确而高效的ViT推断。基于对Softmax注意力和其他注意力变种进行系统的延迟和准确性评估,我们提出了一个异构注意力优化空间。我们还开发了一种简单而有效的MPC感知神经架构搜索算法,以实现快速的Pareto优化。为进一步提高推断效率,我们提出了MPCViT+,同时优化Softmax注意力和其他网络组件,包括GeLU,矩阵乘法等。通过广泛的实验,我们证明了MPCViT在Tiny-ImageNet数据集上比基线ViT,MPCFormer和THE-X分别提高了1.9%,1.3%和4.6%的准确性,同时降低了6.2倍、2.9倍和1.9倍的延迟。在CIFAR-100数据集上,MPCViT+进一步实现了1.2倍的延迟降低,并与MPCViT相比达到了更好的Pareto前缘。