We develop a method to generate prediction intervals that have a user-specified coverage level across all regions of feature-space, a property called conditional coverage. A typical approach to this task is to estimate the conditional quantiles with quantile regression -- it is well-known that this leads to correct coverage in the large-sample limit, although it may not be accurate in finite samples. We find in experiments that traditional quantile regression can have poor conditional coverage. To remedy this, we modify the loss function to promote independence between the size of the intervals and the indicator of a miscoverage event. For the true conditional quantiles, these two quantities are independent (orthogonal), so the modified loss function continues to be valid. Moreover, we empirically show that the modified loss function leads to improved conditional coverage, as evaluated by several metrics. We also introduce two new metrics that check conditional coverage by looking at the strength of the dependence between the interval size and the indicator of miscoverage.
翻译:我们开发了一种方法来生成预测间隔,这种预测间隔在特性空间的所有地区都有用户指定的覆盖水平,这是一种称为有条件覆盖的属性。这项任务的典型方法是以四分位回归来估计有条件的量化数量 -- -- 众所周知,这可以导致大抽样限制的准确覆盖,尽管在有限的样本中可能不准确。我们在实验中发现,传统的量化回归在有条件覆盖上可能很低。为了纠正这一点,我们修改了损失函数,以促进间隔大小与错误覆盖事件指标之间的独立。对于真正的有条件量化,这两个数量是独立的(orthogoal),因此修改的损失函数继续有效。此外,我们从经验上表明,修改的损失函数导致有条件覆盖的改进,正如若干指标所评估的那样。我们还引入了两个新的指标,通过观察间隔大小和错误覆盖指标之间的依赖性强度来检查有条件覆盖。