In this paper, we study the problem of secure ML inference against a malicious client and a semi-trusted server such that the client only learns the inference output while the server learns nothing. This problem is first formulated by Lehmkuhl \textit{et al.} with a solution (MUSE, Usenix Security'21), whose performance is then substantially improved by Chandran et al.'s work (SIMC, USENIX Security'22). However, there still exists a nontrivial gap in these efforts towards practicality, giving the challenges of overhead reduction and secure inference acceleration in an all-round way. We propose SIMC 2.0, which complies with the underlying structure of SIMC, but significantly optimizes both the linear and non-linear layers of the model. Specifically, (1) we design a new coding method for homomorphic parallel computation between matrices and vectors. It is custom-built through the insight into the complementarity between cryptographic primitives in SIMC. As a result, it can minimize the number of rotation operations incurred in the calculation process, which is very computationally expensive compared to other homomorphic operations e.g., addition, multiplication). (2) We reduce the size of the garbled circuit (GC) (used to calculate nonlinear activation functions, e.g., ReLU) in SIMC by about two thirds. Then, we design an alternative lightweight protocol to perform tasks that are originally allocated to the expensive GCs. Compared with SIMC, our experiments show that SIMC 2.0 achieves a significant speedup by up to $17.4\times $ for linear layer computation, and at least $1.3\times$ reduction of both the computation and communication overheads in the implementation of non-linear layers under different data dimensions. Meanwhile, SIMC 2.0 demonstrates an encouraging runtime boost by $2.3\sim 4.3\times$ over SIMC on different state-of-the-art ML models.
翻译:在本文中,我们研究的是针对恶意客户和半信任服务器的安全 ML 推断问题,这样客户只能学习推论输出,而服务器却一无所获。 这个问题首先由 Lehmkuhl \ textit{et al.} 以一个解决方案( MUSE, Usenix Security' 21) 提出, Chandran 等人的工作( SIMC, USENIX Security 22) 大大改善了其性能。 然而, 这些努力在实用性方面还存在非边际差距, 使得管理减少和安全推论速度加快。 我们提议 SIMC 2.0, 它符合 SIMC 的基本结构, 但大大优化了模型的线性和非线性层 。 具体地说, 我们设计了一个新的编码方法, 用于Chandrandern et al- al- al- al- serveral 计算, 由SISMC 的对调控性原始原始原始的原始数据进行量。 结果, 它可以最大限度地减少在计算过程中进行的轮值操作数量, 而不是计算中, 我们的顺序的计算, 。