Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer exchangeable; moreover, in such settings, we might want to use a nonsymmetric algorithm that treats recent observations as more relevant. This paper generalizes conformal prediction to deal with both aspects: we employ weighted quantiles to introduce robustness against distribution drift, and design a new randomization technique to allow for algorithms that do not treat data points symmetrically. Our new methods are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable. We demonstrate the practical utility of these new tools with simulations and real-data experiments on electricity and election forecasting.
翻译:复杂的预测是一种为任意的机器学习模型提供有效的预测推论的流行现代技术。 其有效性取决于数据交换的假设,以及作为数据函数的对称性模型安装算法。 然而,在实际应用预测模型时,互换性经常受到侵犯。 例如,如果数据分配随着时间流逝而变化,那么数据点就不再可以互换;此外,在这种环境下,我们可能想要使用一种非对称算法,将最近观测结果视为更相关的。本文概括地将符合两个方面的预测结果:我们使用加权量化法来引入对分布流的稳健性,并设计一种新的随机化技术,允许不以对称方式处理数据点的算法。我们的新的方法是稳健的,在由于分布流或其他真实数据具有挑战性的特点而使互换性时,互换性就会大大降低,同时,如果数据点事实上可以互换,我们也可以实现与现有一致的预测方法相同的覆盖范围保障。 我们展示这些新工具在电力和选举预测方面的模拟和真实数据实验方面的实际效用。