The question of model goodness of fit, a first step in data analysis, is easy to state, but often difficult to implement in practice, particularly for large and sparse or small-sample but structured data. We focus on this fundamental problem for relational data, which can be represented in form of a network: given one observed network, does the proposed model fit the data? Specifically, we construct finite-sample tests for three different variants of the stochastic blockmodel (SBM). The main building blocks are the known block assignment versions, and we propose extensions to the latent block case. We describe the Markov bases and the marginal polytope of these models. The methodology extends to any mixture of log-linear models on discrete data, and as such is the first application of algebraic statistics sampling for latent-variable models.
翻译:作为数据分析的第一步, " 适当性模型 " 是数据分析的第一步,这个问题很容易说明,但在实践中往往难以执行,特别是对于大、稀少或小样但结构化的数据。我们注重关系数据这一根本问题,这种数据可以网络的形式体现:一个观测到的网络,拟议的模型是否适合数据?具体地说,我们为随机型块模型的三个不同变体(SBM)建立有限的抽样测试。主要的构件是已知的区块分配版本,我们建议扩展到潜在区块案例。我们描述这些模型的马尔科夫基点和边际多功能。这种方法扩大到离散数据日线模型的任何混合,因此,对潜在可变模型首次应用代数统计抽样。