Conformal prediction provides prediction sets with coverage guarantees. The informativeness of conformal prediction depends on its efficiency, typically quantified by the expected size of the prediction set. Prior work on the efficiency of conformalized regression commonly treats the miscoverage level $α$ as a fixed constant. In this work, we establish non-asymptotic bounds on the deviation of the prediction set length from the oracle interval length for conformalized quantile and median regression trained via SGD, under mild assumptions on the data distribution. Our bounds of order $\mathcal{O}(1/\sqrt{n} + 1/(α^2 n) + 1/\sqrt{m} + \exp(-α^2 m))$ capture the joint dependence of efficiency on the proper training set size $n$, the calibration set size $m$, and the miscoverage level $α$. The results identify phase transitions in convergence rates across different regimes of $α$, offering guidance for allocating data to control excess prediction set length. Empirical results are consistent with our theoretical findings.
翻译:共形预测提供具有覆盖保证的预测集。共形预测的信息量取决于其效率,通常通过预测集的期望大小来量化。先前关于共形化回归效率的研究通常将错误覆盖水平 $α$ 视为固定常数。在本工作中,我们在数据分布满足温和假设的条件下,为通过SGD训练的共形化分位数回归和中位数回归,建立了预测集长度与理想区间长度偏差的非渐近界。我们量级为 $\mathcal{O}(1/\sqrt{n} + 1/(α^2 n) + 1/\sqrt{m} + \exp(-α^2 m))$ 的界捕捉了效率对主训练集大小 $n$、校准集大小 $m$ 以及错误覆盖水平 $α$ 的联合依赖性。研究结果揭示了不同 $α$ 区间内收敛速率的相变现象,为分配数据以控制预测集长度超额提供了指导。实证结果与我们的理论发现一致。