Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
翻译:Translated abstract:
主成分分析 (PCA) 是许多数据科学领域中必不可少的算法,我们解决了在多个数据提供方之间执行联合PCA的问题,同时确保数据机密性。我们的解决方案SF-PCA是一个端到端的保密系统,它在几乎所有参与方勾结的被动攻击者模型中保持原始数据和所有中间结果的机密性。SF-PCA共同利用了多方同态加密、交互协议和边缘计算,可以将本地明文数据的计算与集体加密数据的操作有效地交错进行。独立于数据分布在各方之间的情况,SF-PCA获得与非安全集中式解决方案一样精确的结果。它的效率随着数据集维度和数据提供者数量线性或更好地扩展。SF-PCA比仅基于安全多方计算或同态加密的隐私保护替代方案更精确,速度介于 3 倍至 250 倍之间。我们的工作展示了联合和隐私保护PCA在私有分布式数据集上的实际适用性。