Mechanistic network models specify the mechanisms by which networks grow and change, allowing researchers to investigate complex systems using both simulation and analytical techniques. Unfortunately, it is difficult to write likelihoods for instances of graphs generated with mechanistic models because of a combinatorial explosion in outcomes of repeated applications of the mechanism. Thus it is near impossible to estimate the parameters using maximum likelihood estimation. In this paper, we propose treating node sequence in a growing network model as an additional parameter, or as a missing random variable, and maximizing over the resulting likelihood. We develop this framework in the context of a simple mechanistic network model, used to study gene duplication and divergence, and test a variety of algorithms for maximizing the likelihood in simulated graphs. We also run the best-performing algorithm on a human protein-protein interaction network and four non-human protein-protein interaction networks. Although we focus on a specific mechanistic network model here, the proposed framework is more generally applicable to reversible models.
翻译:机械网络模型具体规定网络增长和变化的机制,使研究人员能够利用模拟和分析技术调查复杂系统。不幸的是,由于机械模型反复应用机制的结果发生组合爆炸,很难写出以机械模型生成的图表实例。因此,几乎不可能使用最大可能性估计来估计参数。在本文中,我们提议将网络模型中节点序列作为一个增加的参数处理,或作为一个缺失的随机变量处理,并尽可能扩大由此产生的可能性。我们在一个简单的机械网络模型中开发这一框架,用于研究基因重复和差异,并测试各种算法,以尽量扩大模拟图形中的可能性。我们还对一个人类蛋白蛋白-蛋白互动网络和四个非人类蛋白-蛋白互动网络进行最佳的算法。虽然我们在这里侧重于一个特定的机械网络模型,但拟议的框架更普遍地适用于可逆模型。