Translated title: 可扩展且具有隐私保护的联邦主成分分析 (Scalable and Privacy-Preserving Federated Principal Component Analysis)

David Froelicher,Hyunghoon Cho,Manaswitha Edupalli,Joao Sa Sousa,Jean-Philippe Bossuat,Apostolos Pyrgelis,Juan R. Troncoso-Pastoriza,Bonnie Berger,Jean-Pierre Hubaux

from arxiv, Published elsewhere. IEEE Symposium on Security and Privacy 2023

Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.

翻译：Translated abstract: 主成分分析 (PCA) 是许多数据科学领域中必不可少的算法，我们解决了在多个数据提供方之间执行联合PCA的问题，同时确保数据机密性。我们的解决方案SF-PCA是一个端到端的保密系统，它在几乎所有参与方勾结的被动攻击者模型中保持原始数据和所有中间结果的机密性。SF-PCA共同利用了多方同态加密、交互协议和边缘计算，可以将本地明文数据的计算与集体加密数据的操作有效地交错进行。独立于数据分布在各方之间的情况，SF-PCA获得与非安全集中式解决方案一样精确的结果。它的效率随着数据集维度和数据提供者数量线性或更好地扩展。SF-PCA比仅基于安全多方计算或同态加密的隐私保护替代方案更精确，速度介于 3 倍至 250 倍之间。我们的工作展示了联合和隐私保护PCA在私有分布式数据集上的实际适用性。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【2023新书】实用数据隐私:增强数据的隐私性和安全性，599页pdf

专知会员服务

83+阅读 · 2023年5月1日

【Manning新书】隐私保护的机器学习，323页pdf

专知会员服务

55+阅读 · 2022年11月4日

【干货书】隐私保留机器学习，Privacy-Preserving Machine Learning

专知会员服务

27+阅读 · 2022年4月6日

【CVPR 2022】基于本地正则化和稀疏化差分隐私的联邦学习，Differentially Private Federated Learning with Local Regularization and Sparsification

专知会员服务

17+阅读 · 2022年3月19日