There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading conclusions; corrupted data, models which are only representative of subsets of the data, or multiple regions in which the model is best fit using different parameters. Methods exist for the exclusion of some anomalous types of data, but in practice, data cleaning is often undertaken by hand before attempting to fit models to data. In this work, we will introduce the concept of Bayesian data selection; the simultaneous inference of both model parameters, and parameters which represent our belief that each observation within the data should be included in the inference. The aim, within a Bayesian setting, is to find the regions of observation space for which the model can well-represent the data, and to find the corresponding model parameters for those regions. A number of approaches will be explored, and applied to test problems in linear regression, and to the problem of fitting an ODE model, approximated by a finite difference method, to data. The approaches are extremely simple to implement, can aid mixing of Markov chains designed to sample from the arising densities, and are very broadly applicable to the majority of inferential problems. As such this approach has the potential to change the way that we conduct and interpret the fitting of models to data.
翻译:在试图从数据中推断模型参数时,有许多问题可能会引起问题。数据和模型都是不完善的,因此存在多种假设情况,其中标准推理方法将导致得出误导性结论;腐败数据,只代表数据子集的模型,或模型最适合使用不同参数的多个区域;存在排除某些异常数据类型的方法,但在实践中,数据清理往往是在试图使模型与数据相适应之前手工进行的。在这项工作中,我们将引入巴耶西亚数据选择的概念;同时推论两个模型参数和参数,表明我们认为数据中的每一项观测都应包括在推断中。在巴耶西亚环境下,目的是找到模型能够很好地提供数据的观测空间区域,并为这些地区找到相应的模型参数。将探讨若干方法,并应用到测试线性回归问题,以及适应以有限差异方法比较的ODE模型的问题;同时推论,我们认为,数据中的每一项观测参数都应包含在推断中;在巴耶西亚环境中,找到模型能够很好地展示的观测空间区域,并找到相应的模型参数。我们从最简单的方式将数据转换成一个模型。