In data analysis problems where we are not able to rely on distributional assumptions, what types of inference guarantees can still be obtained? Many popular methods, such as holdout methods, cross-validation methods, and conformal prediction, are able to provide distribution-free guarantees for predictive inference, but the problem of providing inference for the underlying regression function (for example, inference on the conditional mean $\mathbb{E}[Y|X]$) is more challenging. In the setting where the features $X$ are continuously distributed, recent work has established that any confidence interval for $\mathbb{E}[Y|X]$ must have non-vanishing width, even as sample size tends to infinity. At the other extreme, if $X$ takes only a small number of possible values, then inference on $\mathbb{E}[Y|X]$ is trivial to achieve. In this work, we study the problem in settings in between these two extremes. We find that there are several distinct regimes in between the finite setting and the continuous setting, where vanishing-width confidence intervals are achievable if and only if the effective support size of the distribution of $X$ is smaller than the square of the sample size.
翻译:在数据分析问题中,我们无法依赖分布假设,仍然可以获得哪些类型的推论保证?许多流行的方法,例如坚持方法、交叉验证方法和一致预测,能够为预测推理提供无分配保证,但在数据分析中,为基本回归函数提供无分配保证的问题(例如,对条件平均值$mathbb{E}[Y ⁇ X]美元的推论)更具挑战性。在持续分配美元特点的设置中,最近的工作已经确定,美元[Y ⁇ X]美元的任何信任间隔必须具有非减损宽度,即使样本大小往往不完全。在另一个极端,如果美元仅需要少量可能的数值,那么对美元值的推论是微不足道的。在这项工作中,我们研究了这两个极端之间环境中的问题。我们发现,在确定限制设定和连续设定美元之间,在一定的宽度之间,[Y ⁇ X]美元之间的信任间隔必须是非减损宽度的,即使样本-WI值的大小是可实现的,那么,则在有效分配比例的最小度和可实现的情况下,只有折成平方平方之间才能消除。