Leave-one-out cross-validation (LOO-CV) is a popular method for comparing Bayesian models based on their estimated predictive performance on new, unseen, data. As leave-one-out cross-validation is based on finite observed data, there is uncertainty about the expected predictive performance on new data. By modeling this uncertainty when comparing two models, we can compute the probability that one model has a better predictive performance than the other. Modeling this uncertainty well is not trivial, and for example, it is known that the commonly used standard error estimate is often too small. We study the properties of the Bayesian LOO-CV estimator and the related uncertainty estimates when comparing two models. We provide new results of the properties both theoretically in the linear regression case and empirically for multiple different models and discuss the challenges of modeling the uncertainty. We show that problematic cases include: comparing models with similar predictions, misspecified models, and small data. In these cases, there is a weak connection in the skewness of the individual leave-one-out terms and the distribution of the error of the Bayesian LOO-CV estimator. We show that it is possible that the problematic skewness of the error distribution, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations. Based on the results, we also provide practical recommendations for the users of Bayesian LOO-CV for model comparison.
翻译:暂无翻译