Sparse inner product (SIP) has the attractive property of overhead being dominated by the intersection of inputs between parties, independent of the actual input size. It has intriguing prospects, especially for boosting machine learning on large-scale data, which is tangled with sparse data. In this paper, we investigate privacy-preserving SIP problems that have rarely been explored before. Specifically, we propose two concrete constructs, one requiring offline linear communication which can be amortized across queries, while the other has sublinear overhead but relies on the more computationally expensive tool. Our approach exploits state-of-the-art cryptography tools including garbled Bloom filters (GBF) and Private Information Retrieval (PIR) as the cornerstone, but carefully fuses them to obtain non-trivial overhead reductions. We provide formal security analysis of the proposed constructs and implement them into representative machine learning algorithms including k-nearest neighbors, naive Bayes classification and logistic regression. Compared to the existing efforts, our method achieves $2$-$50\times$ speedup in runtime and up to $10\times$ reduction in communication.
翻译:微缩内产物( SIP) 具有吸引人的间接管理特性, 由各方投入的交汇点决定, 与实际投入大小无关。 它具有令人感兴趣的前景, 特别是推动机器学习大规模数据, 这些数据与稀有数据交织在一起。 在本文中, 我们调查了以前很少探讨过的隐私保护SIP问题。 具体地说, 我们提议了两种混凝土结构, 一种要求离线线线通信, 可在查询之间进行摊销, 而另一种则有亚线性间接管理, 但要依靠更昂贵的计算工具。 我们的方法利用最新的加密工具, 包括装饰的Bloom过滤器(GBF)和私人信息检索器(PIR)作为基石, 但是仔细地将它们连接起来, 以获得非边际间接投资的削减。 我们对拟议建筑进行正式的安全分析, 并将其应用到具有代表性的机器学习算法中, 包括k- 远端邻居、 幼贝斯分类和物流回归。 与现有的努力相比, 我们的方法在运行过程中和10\ Time time 。