超过十亿比例数据的实际无损 (Practical Lossless Federated Singular Vector Decomposition over Billion-Scale Data)

With the enactment of privacy-preserving regulations, e.g., GDPR, federated SVD is proposed to enable SVD-based applications over different data sources without revealing the original data. However, many SVD-based applications, such as principal components analysis in genetic studies dealing with billion-scale data, cannot be well supported by existing federated SVD solutions. The crux is that these solutions, adopting either differential privacy (DP) or homomorphic encryption (HE), suffer from accuracy loss caused by unremovable noise or degraded efficiency due to inflated data. In this paper, we propose FedSVD, a practical lossless federated SVD method over billion-scale data, which can simultaneously achieve lossless accuracy and high efficiency. At the heart of FedSVD is a lossless matrix masking scheme delicately designed for SVD: 1) While adopting the masks to protect private data, FedSVD completely removes them from the final results of SVD to achieve lossless accuracy; and 2) As the masks do not inflate the data, FedSVD avoids extra computation and communication overhead during the factorization to maintain high efficiency. Experiments with real-world datasets show that FedSVD is over 10000 times faster than the HE-based method and has 10 orders of magnitude smaller error than the DP-based solution on SVD tasks. We further build and evaluate FedSVD over three real-world applications: principal components analysis (PCA), linear regression (LR), and latent semantic analysis (LSA), to show its superior performance in practice. On federated LR tasks, compared with two state-of-the-art solutions: FATE [17] and SecureML [19], FedSVD-LR is 100 times faster than SecureML and 10 times faster than FATE.

翻译：由于颁布了隐私保护条例,例如GDPR、Federated SVD,因此建议在不披露原始数据的情况下对不同数据源进行基于SVD的SVD应用,但许多基于SVD的应用程序,如涉及10亿级数据的基因研究的主要组成部分分析,不能很好地得到现有的Federal SVD解决方案的支持。关键在于这些解决方案,要么采用不同的隐私(DP)或同质加密(HHE),由于无法复制的噪音或因数据膨胀而降低效率而导致的准确性损失。在本文件中,我们建议FDSVD,一种实际的无损的SVD直流、SVD的SFVD方法超过10级数据,同时实现不损失的准确性和高效率。在FDD的核心,FVD系统采用一种无损失的矩阵掩蔽方法来保护私人数据,而FSVD则完全从基于SVD的最后结果中去除它们,从而实现不易损失的准确性;以及2,由于面面数据没有超过数据,FDSD的更小,FS-D在3级数据上避免超值的准确的S-dealal-de-de-de-de-de-de-de-de-laxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx