Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.
翻译:联邦学习(FL)正在成为传统云型机器学习的一种隐私意识替代。在FL中,敏感数据仍保留在数据库中,只交换汇总参数。不愿分享其数据的医院和研究机构可以加入联合研究,而不会破坏保密性。除了生物医学数据的极端敏感性外,高维性在联合基因组联系研究中构成挑战。在本篇文章中,我们提出了一个适合GWAS的隐私相关和计算要求的配制单价分解算法。值得注意的是,该算法的传输成本不取决于样本数量,仅弱小地取决于特征数量,因为与样本相关的单个矢量从未被交换,而且与特性相关的矢量仅与固定的迭代数相关。虽然由GWAS驱动,但该算法一般适用于横向和纵向分隔数据。