The emergence of social networks and the definition of suitable generative models for synthetic yet realistic social graphs are widely studied problems in the literature. By not being tied to any real data, random graph models cannot capture all the subtleties of real networks and are inadequate for many practical contexts -- including areas of research, such as computational epidemiology, which are recently high on the agenda. At the same time, the so-called contact networks describe interactions, rather than relationships, and are strongly dependent on the application and on the size and quality of the sample data used to infer them. To fill the gap between these two approaches, we present a data-driven model for urban social networks, implemented and released as open source software. Given a territory of interest, and only based on widely available aggregated demographic and social-mixing data, we construct an age-stratified and geo-referenced synthetic population whose individuals are connected by "strong ties" of two types: intra-household (e.g., kinship) or friendship. While household links are entirely data-driven, we propose a parametric probabilistic model for friendship, based on the assumption that distances and age differences play a role, and that not all individuals are equally sociable. The demographic and geographic factors governing the structure of the obtained network, under different configurations, are thoroughly studied through extensive simulations focused on three Italian cities of different size.
翻译:社会网络的出现以及合成但现实的社会图表的适当基因化模型的界定,都是文献中广泛研究的问题。通过不与任何真实数据挂钩,随机图形模型无法捕捉真实网络的所有微妙之处,也不足以适应许多实际背景 -- -- 包括计算流行病学等研究领域,这些领域是最近议程上的优先事项。与此同时,所谓的联系网络描述的是互动,而不是关系,并且在很大程度上取决于应用情况以及用以推断这些互动的抽样数据的规模和质量。为了填补这两个方法之间的差距,我们为城市社会网络提出了一个数据驱动模型,作为开放源代码软件实施和发布。鉴于一个感兴趣的领域,而且仅仅基于广泛获得的综合人口和社会混合数据,我们建立了年龄分层和地理参照的合成人口,其个人与两类“紧密联系”有关:家庭内部(例如,亲属关系)或友谊。虽然家庭联系完全由数据驱动,但我们提出一个基于距离和年龄差异的友好性参数模型。根据一种假设,即距离和年龄差异的地理结构,我们仅仅根据一种不同的意大利网络结构,在透彻的地理结构下,通过一种不同的地理结构中,所有获得的模型都是不同的人口结构。