Joint spectral embeddings facilitate analysis of multiple network data by simultaneously mapping vertices in each network to points in Euclidean space where statistical inference is then performed. In this work, we consider one such joint embedding technique, the omnibus embedding of arXiv:1705.09355 , which has been successfully used for community detection, anomaly detection, and hypothesis testing tasks. To date the theoretical properties of this method have only been established under the strong assumption that the networks are conditionally i.i.d. random dot product graphs. In practice we anticipate multiple networks will possess different structures, necessitating further analysis. Herein, we take a first step in characterizing the theoretical properties of the omnibus embedding in the presence of heterogeneous network data. Under a simple latent position model, we uncover a bias-variance tradeoff for latent position estimation. We establish an explicit bias expression, derive a uniform concentration bound on the residual, and prove a central limit theorem characterizing the distributional properties of these estimates. These explicit bias and variance expressions enable us to state sufficient conditions for exact recovery in community detection tasks and develop a pivotal test statistic to determine whether two graphs share the same set of latent positions; demonstrating that accurate inference is achievable despite the estimator's inconsistency. These results are demonstrated in several experimental settings where statistical procedures utilizing the omnibus embedding are competitive, and oftentimes preferable, to comparable embedding techniques. These observations accentuate the viability of the omnibus embedding for multiple graph inference beyond the homogeneous network setting.
翻译:联合光谱嵌入有助于分析多个网络数据, 将每个网络的脊椎同时映射到当时进行统计推断的欧几里德空间的点。 在这项工作中, 我们考虑一种这样的联合嵌入技术, 即arXiv: 1705. 0.99355 的总括嵌入 : 1705. 9355, 已经成功地用于社区检测、 异常检测和假设测试任务。 迄今为止, 这种方法的理论属性仅建立在以下强有力的假设之下: 这些网络是有条件的 i.d. 随机点产品图。 在实践上, 我们预计多个网络将拥有不同的结构, 需要进一步的分析。 在这里, 我们迈出了第一步, 将总括嵌入的理论属性描述为混杂网络数据的存在。 在一个简单的潜伏定位模型下, 我们发现一种偏差偏差的权衡, 以残余为主, 并证明一个核心的缩略图是这些估算的精度特性。 这些明确的偏差和差异表让我们在社区检测的深度观测任务中有足够的条件进行精确的恢复, 并且制定一个核心的缩缩缩缩缩图, 以显示两个直观的缩图, 。