Private inference (PI) enables inference directly on cryptographically secure data.While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25\%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70\%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a "no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy. Public code is available at \url{https://github.com/NYU-DICE-Lab/selective_network_linearization}.
翻译:私隐推断( PI) 可以直接在加密保密数据上进行推断。 虽然有希望解决许多隐私问题, 但是由于运行时间极短, 它的使用有限。 与纯文本推论不同, 在 PI 非线性函数( 即 ReLU) 中, 长期性为主的PI 非线性函数( 即 ReLU) 是瓶颈 。 因此, 实用 PI 需要新颖的 ReLU- 观测优化 。 为了减少基于渐变的算法, 我们提议一种基于梯度的算法, 在保持预测准确性的同时, 有选择地将 ReLU 线性线性地分解为ReLU 。 我们根据几个标准的 PI 基准评估了我们的算法。 结果表明, 准确性比目前艺术状态低4. 2.5 $ ( iso- reLU 计数为50K) 或 22\ timember $ 更少( iso- 准确性值为 70 ) 。 因此, Pareto 边界超越了 lato latency- acureacre- acreacre- creacreacreacreacreatyl) 。 为了补充经验, 我们提出一个“ 不免费午餐” / arrbal compal commal commal commal commal commal compal 。