In this paper, we propose and study a distributed and secure algorithm for computing dominant (or truncated) singular value decompositions (SVD) of large and distributed data matrices. We consider the scenario where each node privately holds a subset of columns and only exchanges "safe" information with other nodes in a collaborative effort to calculate a dominant SVD for the whole matrix. In the framework of alternating direction methods of multipliers (ADMM), we propose a novel formulation for building consensus by equalizing subspaces spanned by splitting variables instead of equalizing the variables themselves. This technique greatly relaxes feasibility restrictions and accelerates convergence significantly, while at the same time yielding simple subproblems. We design several algorithmic features, including a low-rank multiplier formula and mechanisms for controlling subproblem solution accuracies, to increase the algorithm's computational efficiency and reduce its communication overhead. More importantly, unlike most existing distributed or parallelized algorithms, our algorithm preserves the privacy of locally-held data; that is, none of the nodes can recover the data stored in another node through information exchanged during communications. We present convergence analysis results, including a worst-case complexity estimate, and extensive experimental results indicating that the proposed algorithm, while safely guarding data privacy, has a strong potential to deliver a cutting-edge performance, especially when communication costs are relatively high.
翻译:在本文中,我们提出并研究一个分布和安全的算法,用于计算大型和分布式数据矩阵的主要(或缺线的)单值分解。我们考虑了每个节点私人持有一组分栏和仅与其他节点交换“安全”信息,以协力计算整个矩阵的支配性SVD。在乘数的交替方向方法(ADMM)框架内,我们提出一个新的公式,通过将分布式或平行的算法等同起来,使分解变量的子空间相互平衡,而不是使变量本身等同,以建立共识。这一技术大大放松了可行性限制,加快了趋同速度,同时产生了简单的子问题。我们设计了几种算法特征,包括低层次的乘数公式和机制,用于控制子问题解决方案的精度,以提高算法的计算效率并减少其通信间接费用。更重要的是,与大多数现有的分布式或平行的算法不同,我们的算法维护了当地掌握数据的隐私。也就是说,没有一个节点能够通过另一个节点恢复储存的数据,同时产生简单的分解问题。我们设计了几种算法,我们提出了一种广泛的精确性分析,在比较复杂的实验性数据,我们提出了一种潜在的精确性分析。