Nowadays, more and more datasets are stored in a distributed way for the sake of memory storage or data privacy. The generalized eigenvalue problem (GEP) plays a vital role in a large family of high-dimensional statistical models. However, the existing distributed method for eigenvalue decomposition cannot be applied in GEP for the divergence of the empirical covariance matrix. Here we propose a general distributed GEP framework with one-shot communication for GEP. If the symmetric data covariance has repeated eigenvalues, e.g., in canonical component analysis, we further modify the method for better convergence. The theoretical analysis on approximation error is conducted and the relation to the divergence of the data covariance, the eigenvalues of the empirical data covariance, and the number of local servers is analyzed. Numerical experiments also show the effectiveness of the proposed algorithms.
翻译:目前,为了存储存储或数据隐私,越来越多的数据集是以分布方式储存的。通用的电子价值问题(GEP)在一大批高维统计模型中起着关键作用。然而,由于经验共变矩阵的差异,现有的电子价值分解方法不能应用在GEP中。这里我们提议了一个通用的分布式GEP框架,为GEP提供一次性的通信。如果对称数据常量重复了电子价值,例如,在Canonical组件分析中,我们进一步修改方法,以更好地汇合。对近似误进行了理论分析,并分析了数据共变的差异、经验性数据共变和当地服务器的数量。数字实验还显示了拟议算法的有效性。