In this paper, we tackle a significant challenge in PCA: heterogeneity. When data are collected from different sources with heterogeneous trends while still sharing some congruency, it is critical to extract shared knowledge while retaining unique features of each source. To this end, we propose personalized PCA (PerPCA), which uses mutually orthogonal global and local principal components to encode both unique and shared features. We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem, even if the covariance matrices are immensely different. Also, we design a fully federated algorithm inspired by distributed Stiefel gradient descent to solve the problem. The algorithm introduces a new group of operations called generalized retractions to handle orthogonality constraints, and only requires global PCs to be shared across sources. We prove the linear convergence of the algorithm under suitable assumptions. Comprehensive numerical experiments highlight PerPCA's superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks including video segmentation, topic extraction, and distributed clustering.
翻译:在本文中,我们应对了五氯苯甲醚的重大挑战:异质性。当数据从不同来源收集的数据具有不同趋势,同时仍然具有某种一致性时,关键是获取共享知识,同时保留每种来源的独特特征。为此,我们建议采用个性化的五氯苯甲醚(PerPCA),它使用两个正方形的全球和地方主要组成部分来编码独特和共有的特征。我们表明,在温和条件下,既可以发现独特和共有的特征,也可以通过有限的优化问题加以恢复,即使变量差异很大。此外,我们还设计了一种完全结合的算法,在分布式 Stiefel 梯度下降的启发下,来解决这个问题。该算法引入了一组新的操作,称为普遍撤回,处理不同源的制约,只需要在源间共享全球的计算机。我们证明了在适当假设下算法的线性融合。综合数字实验突出了五氯苯在特征提取和从混杂数据集预测方面的优异性表现。作为分解共享和独特特征的系统方法, PerPCA在多个任务中找到应用程序,包括视频分割、分解专题、分式集和组合。