Effective decision making requires understanding the uncertainty inherent in a prediction. In regression, this uncertainty can be estimated by a variety of methods; however, many of these methods are laborious to tune, generate overconfident uncertainty intervals, or lack sharpness (give imprecise intervals). We address these challenges by proposing a novel method to capture predictive distributions in regression by defining two neural networks with two distinct loss functions. Specifically, one network approximates the cumulative distribution function, and the second network approximates its inverse. We refer to this method as Collaborating Networks (CN). Theoretical analysis demonstrates that a fixed point of the optimization is at the idealized solution, and that the method is asymptotically consistent to the ground truth distribution. Empirically, learning is straightforward and robust. We benchmark CN against several common approaches on two synthetic and six real-world datasets, including forecasting A1c values in diabetic patients from electronic health records, where uncertainty is critical. In the synthetic data, the proposed approach essentially matches ground truth. In the real-world datasets, CN improves results on many performance metrics, including log-likelihood estimates, mean absolute errors, coverage estimates, and prediction interval widths.
翻译:有效的决策要求理解预测所固有的不确定性。在回归中,这种不确定性可以通过多种方法来估计;然而,许多这些方法都难以调和、产生过于自信的不确定性间隔或缺乏清晰度(不精确间隔)。我们通过提出一种新的方法来捕捉回归中的预测分布,方法是界定两个具有两种明显损失功能的神经网络。具体地说,一个网络接近累积分布功能,第二个网络接近其反向。我们把这种方法称为协作网络。理论分析表明,优化的一个固定点是理想化的解决方案,该方法与地面真相分布是完全一致的。随机的,学习是直截了当的和稳健健的。我们参照两种合成和现实世界数据集的一些共同方法来衡量氯化萘的回归分布情况,包括预测不确定性的关键电子健康记录中的糖尿病病人的A1c值。在合成数据中,拟议的方法基本上与地面真相相符。在现实世界数据集中,氯化萘改进了许多性指标的结果,包括逻辑类比估计、绝对概率。