Simulation models of scientific interest often lack a tractable likelihood function, precluding standard likelihood-based statistical inference. A popular likelihood-free method for inferring simulator parameters is approximate Bayesian computation, where an approximate posterior is sampled by comparing simulator output and observed data. However, effective measures of closeness between simulated and observed data are generally difficult to construct, particularly for time series data which are often high-dimensional and structurally complex. Existing approaches typically involve manually constructing summary statistics, requiring substantial domain expertise and experimentation, or rely on unrealistic assumptions such as iid data. Others are inappropriate in more complex settings like multivariate or irregularly sampled time series data. In this paper, we introduce the use of path signatures as a natural candidate feature set for constructing distances between time series data for use in approximate Bayesian computation algorithms. Our experiments show that such an approach can generate more accurate approximate Bayesian posteriors than existing techniques for time series models.
翻译:科学关注的模拟模型往往缺乏可移植的可能性功能,排除了标准的基于概率的统计推论。一种流行的无概率推算模拟参数的方法是近似贝叶斯计算法,通过比较模拟输出和观察到的数据对近似后方数据进行抽样比较。然而,模拟数据和观测数据之间的近距离测量通常很难构建,特别是对于往往是高维和结构复杂的时间序列数据。现有方法通常涉及手工构建摘要统计,需要大量的域内专门知识和实验,或依赖不切实际的假设,如iid数据。在诸如多变量或不定期抽样的时间序列数据等更为复杂的环境中,其他方法并不合适。在本文件中,我们介绍使用路径签名作为在近似贝叶斯计算算法中使用的时间序列数据之间搭建距离的自然候选特征。我们的实验表明,这样一种方法可以产生比时间序列模型的现有技术更准确的近似海湾后方数据。