Private inference (PI) enables inference directly on cryptographically secure data. While promising to address many privacy issues, it has seen limited use due to extreme runtimes. Unlike plaintext inference, where latency is dominated by FLOPs, in PI non-linear functions (namely ReLU) are the bottleneck. Thus, practical PI demands novel ReLU-aware optimizations. To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. We evaluate our algorithm on several standard PI benchmarks. The results demonstrate up to $4.25\%$ more accuracy (iso-ReLU count at 50K) or $2.2\times$ less latency (iso-accuracy at 70\%) than the current state of the art and advance the Pareto frontier across the latency-accuracy space. To complement empirical results, we present a "no free lunch" theorem that sheds light on how and when network linearization is possible while maintaining prediction accuracy.
翻译:私自推断可以直接在加密保密数据上进行推断。 虽然它有希望解决许多隐私问题, 但是由于极端运行时间, 它的使用有限。 与普通文本推断不同, 在 PI 非线性函数( 即 ReLU) 中, 长期性为主的 PI 非线性函数( 即 ReLU ) 是瓶颈。 因此, 实用 PI 需要新颖的 ReLU- 觉醒优化。 为了降低 PI 的延度, 我们提议一种基于梯度的算法, 有选择地将ReLU 线性成线性, 并保持预测准确性。 我们根据几个标准 PI 基准评估了我们的算法。 结果表明, 准确度比当前艺术状态低4. 25 $( i- reLU 计数为50K), 或者 22\ timember $( eso- ocurecyality at 70 ) 。 为了补充实证性结果, 我们提出了一个“ 无免费午餐”, 我们提出了一个“ 意思, 说明网络线性如何和何时可以线性线性变精确性。