The simultaneous rise of machine learning as a service and concerns over user privacy have increasingly motivated the need for private inference (PI). While recent work demonstrates PI is possible using cryptographic primitives, the computational overheads render it impractical. The community is largely unprepared to address these overheads, as the source of slowdown in PI stems from the ReLU operator whereas optimizations for plaintext inference focus on optimizing FLOPs. In this paper we re-think the ReLU computation and propose optimizations for PI tailored to properties of neural networks. Specifically, we reformulate ReLU as an approximate sign test and introduce a novel truncation method for the sign test that significantly reduces the cost per ReLU. These optimizations result in a specific type of stochastic ReLU. The key observation is that the stochastic fault behavior is well suited for the fault-tolerant properties of neural network inference. Thus, we provide significant savings without impacting accuracy. We collectively call the optimizations Circa and demonstrate improvements of up to 4.7x storage and 3x runtime over baseline implementations; we further show that Circa can be used on top of recent PI optimizations to obtain 1.8x additional speedup.
翻译:机器学习作为一种服务和对用户隐私的关切同时上升,这日益促使人们需要私人推断。虽然最近的工作表明使用加密原始数据是可能的,但计算间接费用不切实际。社区基本上没有准备处理这些间接费用,因为PI的减速来源于ReLU操作器,而光文本推断力的优化则侧重于优化FLOPs。在本文件中,我们重新思考ReLU的计算,并提议根据神经网络的特性对PI进行优化。具体地说,我们重新配置ReLU作为近似信号测试,并采用新的信号测试抽查方法,以大幅降低每个RELU的费用。这些优化的结果是特定类型的随机性ReLU操作器。关键观察是,随机性过错行为非常适合神经网络推断的过错性能。因此,我们在不影响准确性的情况下对RELU进行重大节约。我们共同呼吁Sirca进行优化,并展示到4.7x储存和3x运行中的最新Plex速度实施速度的改进;我们进一步显示,在18x进行最高级的状态上,我们可以进一步展示Circa。