综合异异种数据流行病学建模方面的差异 (Discrepancies in Epidemiological Modeling of Aggregated Heterogeneous Data)

Within epidemiological modeling, the majority of analyses assume a single epidemic process for generating ground-truth data. However, this assumed data generation process can be unrealistic, since data sources for epidemics are often aggregated across geographic regions and communities. As a result, state-of-the-art models for estimating epidemiological parameters, e.g.~transmission rates, can be inappropriate when faced with complex systems. Our work empirically demonstrates some limitations of applying epidemiological models to aggregated datasets. We generate three complex outbreak scenarios by combining incidence curves from multiple epidemics that are independently simulated via SEIR models with different sets of parameters. Using these scenarios, we assess the robustness of a state-of-the-art Bayesian inference method that estimates the epidemic trajectory from viral load surveillance data. We evaluate two data-generating models within this Bayesian inference framework: a simple exponential growth model and a highly flexible Gaussian process prior model. Our results show that both models generate accurate transmission rate estimates for the combined incidence curve at the cost of generating biased estimates for each underlying epidemic, reflecting highly heterogeneous underlying population dynamics. The exponential growth model, while interpretable, is unable to capture the complexity of the underlying epidemics. With sufficient surveillance data, the Gaussian process prior model captures the shape of complex trajectories, but is imprecise for periods of low data coverage. Thus, our results highlight the potential pitfalls of neglecting complexity and heterogeneity in the data generation process, which can mask underlying location- and population-specific epidemic dynamics.

翻译：在流行病学模型中,大多数分析假设产生地面真象数据的单一流行病过程;然而,这一假定的数据产生过程可能不现实,因为流行病的数据源往往在地理区域和社区之间汇总;因此,在面临复杂的系统时,估计流行病参数的最先进的模型,例如传染率,可能不合适;我们的工作经验表明,在将流行病学模型应用于综合数据集方面有一些局限性;我们通过将多种流行病的发病率曲线结合到通过SEIR模型独立模拟的具有不同参数的多重流行病的发病率曲线,产生了三种复杂的爆发假设;我们利用这些假设,评估了一种最新的贝叶斯推论方法的稳健性,该方法根据病毒负荷监测数据数据估算流行病的轨迹;我们评估了贝叶斯推论的两种数据生成模型:一个简单的指数增长模型,以及一个非常灵活的计数程序;我们的结果显示,两种模型都得出准确的传播率估计数,以产生对每一种基本流行病的偏差估计值为代价,反映高度差异的人口动态;我们评估了一种指数化的精确性动态模型,而这种精确性模型则无法对人口先前的生成数据进行精确度进行解释。