时间均匀中心极限定理及渐近置信区间序列 (Time-uniform central limit theory and asymptotic confidence sequences)

Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen and improvements by Koml\'os, Major, and Tusn\'ady) to uniformly approximate the entire sample average process by an implicit Gaussian process. We demonstrate their utility by deriving nonparametric asymptotic CSs for the average treatment effect based on doubly robust estimators in observational studies, for which no nonasymptotic methods can exist even in the fixed-time regime. This enables causal inference that can be continuously monitored and adaptively stopped.

翻译：基于中心极限定理的置信区间是经典统计学中的基石。尽管只有渐近有效性，但由于它们允许在非常弱的假设下进行统计推断，并且通常可以应用于即使在非渐近推断不可能的问题上，因此它们是不可避免的。本文介绍了这些渐近置信区间的时间均匀模拟。为了说明，我们的方法采用置信区间序列（CS）的形式，即置信区间序列在时间上均匀有效。 CS在任意停止时间上都提供有效的推断，不需要固定的样本大小，不像经典的置信区间。现有文献中的CS是非渐近的，因此不具有渐近置信区间的广泛适用性。我们的工作通过给出“渐近CS”的定义，并推导出一种仅需要弱CLT类似假设的通用渐近CS来弥合这一差距。虽然中心极限定理在固定的样本大小下通过样本平均值的分布来近似高斯分布，但我们使用强不变原理（源自Strassen的开创性1960s工作以及Komlós，Major和Tusnády的改进）将整个样本平均过程均匀地近似为隐式高斯过程。我们通过推导基于双重稳健估计器的观察研究中的平均治疗效应的非参数渐近置信区间来证明它们的实用性，即使在固定时间范围内也不能存在任何非渐近方法。这使得可以持续监控和自适应停止的因果推断成为可能。