In statistical inference, uncertainty is unknown and all models are wrong. That is to say, a person who makes a statistical model and a prior distribution is simultaneously aware that both are fictional candidates. To study such cases, statistical measures have been constructed, such as cross validation, information criteria, and marginal likelihood, however, their mathematical properties have not yet been completely clarified when statistical models are under- and over- parametrized. We introduce a place of mathematical theory of Bayesian statistics for unknown uncertainty, which clarifies general properties of cross validation, information criteria, and marginal likelihood, even if an unknown data-generating process is unrealizable by a model or even if the posterior distribution cannot be approximated by any normal distribution. Hence it gives a helpful standpoint for a person who cannot believe in any specific model and prior. This paper consists of three parts. The first is a new result, whereas the second and third are well-known previous results with new experiments. We show there exists a more precise estimator of the generalization loss than leave-one-out cross validation, there exists a more accurate approximation of marginal likelihood than BIC, and the optimal hyperparameters for generalization loss and marginal likelihood are different.
翻译:在统计推论中,不确定性是未知的,所有模型都是错的。也就是说,统计模型和先前分布的人同时意识到,两者都是虚构的候选者。为了研究这类案例,已经制定了统计措施,例如交叉验证、信息标准和可能性很小,然而,当统计模型的偏差和偏差度过低时,其数学特性尚未完全澄清。我们引入了巴伊西亚统计数学理论的场所,以未知的不确定性为特征,它澄清了交叉验证、信息标准和可能性很小的一般特性,即使一个模型无法实现一个未知的数据生成过程,即使任何正常分布都无法近似其外分布。因此,它为无法相信任何特定模型和以前的人提供了一个有益的观点。本文由三部分组成。第一个是新结果,而第二和第三是已知的以往新实验结果。我们显示,与离职单交叉验证相比,普遍化损失的估算值更精确,比BIC更精确地接近边际可能性,而最佳的超边际概率则不同。