Network data are commonly collected in a variety of applications, representing either directly measured or statistically inferred connections between features of interest. In an increasing number of domains, these networks are collected over time, such as interactions between users of a social media platform on different days, or across multiple subjects, such as in multi-subject studies of brain connectivity. When analyzing multiple large networks, dimensionality reduction techniques are often used to embed networks in a more tractable low-dimensional space. To this end, we develop a framework for principal components analysis (PCA) on collections of networks via a specialized tensor decomposition we term Semi-Symmetric Tensor PCA or SS-TPCA. We derive computationally efficient algorithms for computing our proposed SS-TPCA decomposition and establish statistical efficiency of our approach under a standard low-rank signal plus noise model. Remarkably, we show that SS-TPCA achieves the same estimation accuracy as classical matrix PCA, with error proportional to the square root of the number of vertices in the network and not the number of edges as might be expected. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outlying observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on an example from empirical legal studies. The techniques used to establish our main consistency results are surprisingly straightforward and may find use in a variety of other network analysis problems.
翻译:网络数据通常在多种应用中收集,代表了直接计量或统计推断的不同特点之间的关联。在越来越多的领域,这些网络是随着时间的推移而收集的,例如社交媒体平台用户之间在不同日子里的互动,或者在多主题的大脑连接研究中,例如多主题的大脑连接。在分析多个大型网络时,多维性减少技术通常用于将网络嵌入一个更易于伸缩的低维空间。为此,我们开发了一个主要组成部分分析框架(PCA),用于通过专门分类(我们称为半对称天体五氯苯甲醚或SS-TPCA)收集网络。我们从计算我们提议的SS-TPCA分解的用户之间,或者在多个主题之间,例如在标准低级信号加噪音模型下,生成计算我们方法的统计效率的计算高效算法。我们明显地显示,SS-TPCA的估算准确性与典型的五氯苯甲醚矩阵的准确性相同,与网络的平方根值成成正比,而不是预期的边缘数。我们的框架继承了我们拟议的SS-TPC分法主要技术分解的优点,我们所使用的大多数法律网络的精度研究的精度和精确性研究,用来用来研究的精确性分析。