学习1 - 随机点产品图后推论的多元子元值 (Learning 1-Dimensional Submanifolds for Subsequent Inference on Random Dot Product Graphs)

A random dot product graph (RDPG) is a generative model for networks in which vertices correspond to positions in a latent Euclidean space and edge probabilities are determined by the dot products of the latent positions. We consider RDPGs for which the latent positions are randomly sampled from an unknown $1$-dimensional submanifold of the latent space. In principle, restricted inference, i.e., procedures that exploit the structure of the submanifold, should be more effective than unrestricted inference; however, it is not clear how to conduct restricted inference when the submanifold is unknown. We submit that techniques for manifold learning can be used to learn the unknown submanifold well enough to realize benefit from restricted inference. To illustrate, we test $1$- and $2$-sample hypotheses about the Fr\'{e}chet means of small communities of vertices, using the complete set of vertices to infer latent structure. We propose test statistics that deploy the Isomap procedure for manifold learning, using shortest path distances on neighborhood graphs constructed from estimated latent positions to estimate arc lengths on the unknown $1$-dimensional submanifold. Unlike conventional applications of Isomap, the estimated latent positions do not lie on the submanifold of interest. We extend existing convergence results for Isomap to this setting and use them to demonstrate that, as the number of auxiliary vertices increases, the power of our test converges to the power of the corresponding test when the submanifold is known. Finally, we apply our methods to an inference problem that arises in studying the connectome of the Drosophila larval mushroom body. The univariate learnt manifold test rejects ($p<0.05$), while the multivariate ambient space test does not ($p\gg0.05$), illustrating the value of identifying and exploiting low-dimensional structure for subsequent inference.

翻译：随机 dot 产品图( RDPG) 是网络的基因模型, 其中脊椎与潜伏的 Euclide 空间的位置相对应, 边缘概率则由潜伏位置的点产物决定。我们考虑潜伏位置被随机抽样的 RDPG 。原则上, 有限的推论, 即利用子值结构的程序比不受限制的推论更有效; 但是, 不清楚如何在潜伏的 Euclide 空间的位置上进行有限制的推论。我们提出, 使用最短的路径距离, 使用多极值 5 来进行有限制的推论。我们提出, 用于多极值的电流学技术可以用来学习未知的子值。我们提出, 使用最短的路径距离, 用于多极值 5 用于多极值。估计, 我们测试了未知的子值, 用于在常规的底值的底值的底值, 用于数字的底值的底值。