Classic and deep generalized canonical correlation analysis (GCCA) algorithms seek low-dimensional common representations of data entities from multiple ``views'' (e.g., audio and image) using linear transformations and neural networks, respectively. When the views are acquired and stored at different computing agents (e.g., organizations and edge devices) and data sharing is undesired due to privacy or communication cost considerations, federated learning-based GCCA is well-motivated. In federated learning, the views are kept locally at the agents and only derived, limited information exchange with a central server is allowed. However, applying existing GCCA algorithms onto such federated learning settings may incur prohibitively high communication overhead. This work puts forth a communication-efficient federated learning framework for both linear and deep GCCA under the maximum variance (MAX-VAR) formulation. The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the computing agents and a central controller. Compared to the unquantized version, our empirical study shows that the proposed algorithm enjoys a substantial reduction of communication overheads with virtually no loss in accuracy and convergence speed. Rigorous convergence analyses are also presented, which is a nontrivial effort. Generic federated optimization results do not cover the special problem structure of GCCA. Our result shows that the proposed algorithms for both linear and deep GCCA converge to critical points at a sublinear rate, even under heavy quantization and stochastic approximations. In addition, in the linear MAX-VAR case, the quantized algorithm approaches a global optimum in a geometric rate under reasonable conditions. Synthetic and real-data experiments are used to showcase the effectiveness of the proposed approach.
翻译:经典和深度广义规范相关分析(GCCA)算法通过线性变换和神经网络,从多个“视图”(例如音频和图像)中获取数据实体的低维共同表示。当视图在不同的计算代理(例如组织和边缘设备)中被获取和存储,而由于隐私或通信成本的考虑,不希望共享数据时,联邦学习型的GCCA是有动机的。在联邦学习中,视图被保留在代理的本地,并且只与中央服务器进行派生的、有限的信息交换。然而,在这样的联邦学习设置中应用现有的GCCA算法可能会导致非常高的通信开销。本研究提出了一个通信高效的联邦式学习框架,用于最大方差(MAX-VAR)公式下的线性和深度GCCA。通过对计算代理和中央控制器之间的交换信息进行积极压缩(通过量化),解决了过高的开销问题。与未量化版本相比,我们的实证研究表明,所提出的算法在几乎不损失精度和收敛速度的情况下,可以大幅减少通信开销。同时还提供了严格的收敛性分析,这是一个非常有挑战性的工作。通用的联邦优化结果无法覆盖GCCA的特殊问题结构。我们的结果表明,对于线性和深度GCCA,所提出的算法以亚线性速率收敛到临界点,即使在大量量化和随机逼近的情况下仍然如此。此外,在线性MAX-VAR情况下,该量化算法在合理条件下以几何速率逼近全局最优。通过合成和真实数据实验展示了所提出方法的有效性。