Real-world problems, often couched as machine learning applications, involve quantities of interest that have real-world meaning, independent of any statistical model. To avoid potential model misspecification bias or over-complicating the problem formulation, a direct, model-free approach is desired. The traditional Bayesian framework relies on a model for the data-generating process so, apparently, the desired direct, model-free, posterior-probabilistic inference is out of reach. Fortunately, likelihood functions are not the only means of linking data and quantities of interest. Loss functions provide an alternative link, where the quantity of interest is defined, or at least could be defined, as a minimizer of the corresponding risk, or expected loss. In this case, one can obtain what is commonly referred to as a Gibbs posterior distribution by using the empirical risk function directly. This manuscript explores the Gibbs posterior construction, its asymptotic concentration properties, and the frequentist calibration of its credible regions. By being free from the constraints of model specification, Gibbs posteriors create new opportunities for probabilistic inference in modern statistical learning problems.
翻译:现实世界问题通常被描述为机械学习应用,它涉及大量的兴趣,这些兴趣具有现实世界意义,独立于任何统计模式。为了避免潜在的模型错误区分偏差或过度复制问题配制,需要一种直接的、不采用模型的办法。传统的巴伊西亚框架依赖于数据生成过程的模型,显然,理想的直接、无模型的、后生概率推论不可能达到。幸运的是,可能性功能并不是连接数据和利息数量的唯一手段。损失功能提供了一个替代链接,其中确定了利息的数量,或至少可以确定利息的数量,作为相应风险或预期损失的最小化器。在这种情况下,人们可以通过直接使用经验风险功能获得通常被称为Gibbs 后生分配的东西。本文探讨了Gibbs 后生结构、其无药性集中特性以及其可靠区域的频繁校准。Gibs 后生者摆脱了模型规格的限制,为现代统计学习问题的概率推断创造了新的机会。