个体化五氯苯甲醚:脱钩共有和独特地物 (Personalized PCA: Decoupling Shared and Unique Features)

In this paper, we tackle a significant challenge in PCA: heterogeneity. When data are collected from different sources with heterogeneous trends while still sharing some congruency, it is critical to extract shared knowledge while retaining unique features of each source. To this end, we propose personalized PCA (PerPCA), which uses mutually orthogonal global and local principal components to encode both unique and shared features. We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem, even if the covariance matrices are immensely different. Also, we design a fully federated algorithm inspired by distributed Stiefel gradient descent to solve the problem. The algorithm introduces a new group of operations called generalized retractions to handle orthogonality constraints, and only requires global PCs to be shared across sources. We prove the linear convergence of the algorithm under suitable assumptions. Comprehensive numerical experiments highlight PerPCA's superior performance in feature extraction and prediction from heterogeneous datasets. As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks including video segmentation, topic extraction, and distributed clustering.

翻译：在本文中,我们应对了五氯苯甲醚的重大挑战:异质性。当数据从不同来源收集的数据具有不同趋势,同时仍然具有某种一致性时,关键是获取共享知识,同时保留每种来源的独特特征。为此,我们建议采用个性化的五氯苯甲醚(PerPCA),它使用两个正方形的全球和地方主要组成部分来编码独特和共有的特征。我们表明,在温和条件下,既可以发现独特和共有的特征,也可以通过有限的优化问题加以恢复,即使变量差异很大。此外,我们还设计了一种完全结合的算法,在分布式 Stiefel 梯度下降的启发下,来解决这个问题。该算法引入了一组新的操作,称为普遍撤回,处理不同源的制约,只需要在源间共享全球的计算机。我们证明了在适当假设下算法的线性融合。综合数字实验突出了五氯苯在特征提取和从混杂数据集预测方面的优异性表现。作为分解共享和独特特征的系统方法, PerPCA在多个任务中找到应用程序,包括视频分割、分解专题、分式集和组合。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日