Labelled networks form a very common and important class of data, naturally appearing in numerous applications in science and engineering. A typical inference goal is to determine how the vertex labels(or {\em features}) affect the network's graph structure. A standard approach has been to partition the network into blocks grouped by distinct values of the feature of interest. A block-based random graph model -- typically a variant of the stochastic block model -- is then used to test for evidence of asymmetric behaviour within these feature-based communities. Nevertheless, the resulting communities often do not produce a natural partition of the graph. In this work, we introduce a new generative model, the feature-first block model (FFBM), which is more effective at describing vertex-labelled undirected graphs and also facilitates the use of richer queries on labelled networks. We develop a Bayesian framework for inference with this model, and we present a method to efficiently sample from the posterior distribution of the FFBM parameters. The FFBM's structure is kept deliberately simple to retain easy interpretability of the parameter values. We apply the proposed methods to a variety of network data to extract the most important features along which the vertices are partitioned. The main advantages of the proposed approach are that the whole feature-space is used automatically, and features can be rank-ordered implicitly according to impact. Any features that do not significantly impact the high-level structure can be discarded to reduce the problem dimension. In cases where the vertex features available do not readily explain the community structure in the resulting network, the approach detects this and is protected against over-fitting. Results on several real-world datasets illustrate the performance of the proposed methods.
翻译:螺旋网络形成一个非常常见和重要的数据类别, 自然地出现在科学和工程的众多应用中。 一个典型的推断目标是确定顶点标签( 或 {em species} ) 如何影响网络的图形结构。 一个标准的方法是将网络分割成块块, 以不同的利益特征的明显值为组合组合。 一个基于块状的随机图模型, 通常是随机区块模型的变异模型, 然后用来测试这些基于地貌的社区中不对称行为的证据。 然而, 由此产生的社区往往不会产生图的自然分布。 在此工作中, 我们引入一个新的顶点结构结构模型, 即地一区块模型( FFB) 模型( FFBS) 如何影响网络结构的自然分布。 我们使用这个方法来描述顶点的顶点的特性, 并便利在标签网络上使用更丰富的查询。 我们开发一个Bayesian框架来推断这个模型, 我们提出一种方法可以从基于地表的分布式方法中有效地取样 。 FFBS 结构的结构可以故意简单地使参数值的值值值值变得不易易理解 值值值值值值值值值值值值值值值值。 我们应用了网络的模型的模型在其中的模型中所使用的模型在其中, 。 。 在其中, 将使用一个主要的模型的模型的特性的特性的特性的特性特性特性的特性是用来用来测量法是用来测量值的特性的特性的特性的特性的特性的特性的特性的特性的特性是用来解释。