回归性诊断:预测评价:有条件校准、可靠性图和确定系数 (Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination)

Model diagnostics and forecast evaluation are two sides of the same coin. A common principle is that fitted or predicted distributions ought to be calibrated or reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary outcomes, this is a universal concept of reliability. For general real-valued outcomes, practitioners and theoreticians have relied on weaker, unconditional notions, most prominently probabilistic calibration, which corresponds to the uniformity of the probability integral transform. Conditional concepts give rise to hierarchies of calibration. In a nutshell, a predictive distribution is conditionally T-calibrated if it can be taken at face value in terms of the functional T. Whenever T is defined via an identification function - as in the cases of threshold (non) exceedance probabilities, quantiles, expectiles, and moments - auto-calibration implies T-calibration. However, the notion of T-calibration also applies to stand-alone point forecasts or regression output in terms of the functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration (MCB), discrimination (DSC), and uncertainty (UNC). In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination, $$\text{R}^\ast = \frac{\text{DSC}-\text{MCB}}{\text{UNC}},$$ that nests and reinterprets the classical $\text{R}^2$ in least squares (mean) regression and its natural analogue $\text{R}^1$ in quantile regression, yet applies to T-regression in general, with MCB $\geq 0$, DSC $\geq 0$, and $\text{R}^\ast \in [0,1]$ under modest conditions.

翻译：模型诊断和预测评价是同一个硬币的两面。一个共同的原则是,适合或预测的值分配应该校准或可靠, 最好是自动校准, 其结果是从假设的分布中随机提取的。对于二进制结果, 这是一个普遍的可靠性概念。对于一般的真值结果, 实践者和理论家们依赖较弱的无条件概念, 最明显的概率校准, 这与概率整体变异一致。调解概念导致校准的等级。在一个螺旋中, 预测的分布是有条件的 T 校准, 如果从功能的 T值的面值上看到的话。当T 被确定时, 就像阈值( 不) 概率、孔径、期望值和瞬间- 自动校正校准意味着普通的调和调和调和。然而, T 直径直点的预测或正反向值, 直径直点预测和直径解的值- 数字- 数字- 数字- 数字- 数字- 数字- 和直径比- 数字- 数字- 数字- 数字- 图表- 和直径比- C- 图表- 图表- 图表- 数字- 和直判制- 数字- 度- 数字- 图表- 数字- 数字- 数字- 和直- 数字- 数字- 数字- 数字- 数字- 数字- 和直判制- 和和和和和度- 数字- 数字- 数字- C- 度- 度- 和数字- C- 度- C- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度-- 和和和和和和度- 和度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 和度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 度- 和