The last two decades have witnessed considerable progress on foundational aspects of statistical network analysis, but less attention has been paid to the complex statistical issues arising in real-world applications. Here, we consider two samples of within-household contact networks in Belgium generated by different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We wish to combine their strengths to learn the social forces that shape household contact formation and facilitate simulation for prediction of disease spread, while generalising to the population of households in the region. To accomplish this, we introduce a flexible framework for specifying multi-network models in the exponential family class and identify the requirements for inference and prediction under this framework to be consistent, identifiable, and generalisable, even when data are incomplete; explore how these requirements may be violated in practice; and develop a suite of quantitative and graphical diagnostics for detecting violations and suggesting improvements to a candidate model. We report on the effects of network size, geography, and household roles on household contact patterns (activity, heterogeneity in activity, and triadic closure).
翻译:在过去20年中,统计网络分析在基础方面取得了相当大的进展,但对现实世界应用中出现的复杂统计问题重视较少。在这里,我们考虑由不同但互补的抽样设计产生的比利时家庭内部联系网络的两个样本:一个较小,但每个家庭都观察到所有接触,另一个更大、更具代表性但记录每个家庭只有一人的接触。我们希望结合其优势,学习影响家庭联系形成的社会力量,并便利模拟疾病传播的预测,同时向该区域家庭人口进行概括。为此,我们引入了一个灵活框架,在指数式家庭类别中具体列出多网络模式,并确定这一框架下的推断和预测要求是一致、可识别和可概括的,即使数据不完整;探讨如何在实践中违反这些要求;制定一套定量和图表诊断方法,以发现违规行为,并建议改进候选模式。我们报告网络规模、地理和家庭作用对家庭接触模式的影响(活动、活动杂交性以及三重关闭)。