We consider the problem of interpretable network representation learning for samples of network-valued data. We propose the Principal Component Analysis for Networks (PCAN) algorithm to identify statistically meaningful low-dimensional representations of a network sample via subgraph count statistics. The PCAN procedure provides an interpretable framework for which one can readily visualize, explore, and formulate predictive models for network samples. We furthermore introduce a fast sampling-based algorithm, sPCAN, which is significantly more computationally efficient than its counterpart, but still enjoys advantages of interpretability. We investigate the relationship between these two methods and analyze their large-sample properties under the common regime where the sample of networks is a collection of kernel-based random graphs. We show that under this regime, the embeddings of the sPCAN method enjoy a central limit theorem and moreover that the population level embeddings of PCAN and sPCAN are equivalent. We assess PCAN's ability to visualize, cluster, and classify observations in network samples arising in nature, including functional connectivity network samples and dynamic networks describing the political co-voting habits of the U.S. Senate. Our analyses reveal that our proposed algorithm provides informative and discriminatory features describing the networks in each sample. The PCAN and sPCAN methods build on the current literature of network representation learning and set the stage for a new line of research in interpretable learning on network-valued data. Publicly available software for the PCAN and sPCAN methods are available at https://www.github.com/jihuilee/.
翻译:我们考虑对网络价值数据样本进行可解释的网络代表性学习的问题。我们建议网络主构分析算法(PCAN)通过子帐号统计,确定网络样本具有统计意义、低维度的网络代表性。PCAN程序提供了一个可解释的框架,人们可以方便地对网络样本进行可视化、探索和制定预测模型。我们还采用了快速的基于抽样的SPCAN算法(SPCAN),该算法在计算上比对应方效率高得多,但仍享有可解释性的好处。我们调查这两种方法之间的关系,分析在共同制度下,网络样本是收集以内核内核为基础的随机图集。我们显示,在这个制度下,SPCAN方法的嵌入具有中心性限,此外,将PCAN和SPCAN的嵌入层结构进行人口层面的嵌入。我们评估PCAN在性质上产生的网络样本、组合和分类中的观测结果,包括功能连结网络样本和动态网络,描述U.S.S.S.P.BER。我们的分析显示,我们在目前版本的样本网络中,为学习每一阶段,我们现有的数据库数据库和数据库提供新的数据。