The computational cost of exact likelihood evaluation for partially observed and highly-heterogeneous individual-based models grows exponentially with the population size, therefore inference relies on approximations. Sampling-based approaches to this problem such as Sequential Monte Carlo or Approximate Bayesian Computation usually require simulation of every individual in the population multiple times and are heavily reliant on the design of bespoke proposal distributions or summary statistics, and can still scale poorly with population size. To overcome this, we propose a deterministic recursive approach to approximating the likelihood function using categorical distributions. The resulting algorithm has a computational cost as low as linear in the population size and is amenable to automatic differentiation, leading to simple algorithms for maximizing this approximate likelihood or sampling from posterior distributions. We prove consistency of the maximum approximate likelihood estimator of model parameters. We empirically test our approach on a range of models with various flavors of heterogeneity: different sets of disease states, individual-specific susceptibility and infectivity, spatial interaction mechanisms, under-reporting and mis-reporting. We demonstrate strong calibration performance, in terms of log-likelihood variance and ground truth recovery, and computational advantages over competitor methods. We conclude by illustrating the effectiveness of our approach in a real-world large-scale application using Foot-and-Mouth data from the 2001 outbreak in the United Kingdom.
翻译:暂无翻译