Concentration inequalities for the sample mean, like those due to Bernstein and Hoeffding, are valid for any sample size but overly conservative, yielding confidence intervals that are unnecessarily wide. The central limit theorem (CLT) provides asymptotic confidence intervals with optimal width, but these are invalid for all sample sizes. To resolve this tension, we develop new computable concentration inequalities with asymptotically optimal size, finite-sample validity, and sub-Gaussian decay. These bounds enable the construction of efficient confidence intervals with correct coverage for any sample size. We derive our inequalities by tightly bounding the Hellinger distance, Stein discrepancy, and Wasserstein distance to a Gaussian, and, as a byproduct, we obtain the first explicit bounds for the Hellinger CLT.
翻译:样本平均值的浓度不平等,如伯恩斯坦和霍夫丁造成的浓度不平等,对任何样本规模都有效,但过于保守,产生不必要的宽度的信任间隔。核心理论(CLT)提供了最佳宽度的无症状信任间隔,但对所有样本大小都是无效的。为了解决这种紧张,我们开发了新的可计算浓度不平等,其规模在时间上是尽可能最佳的,其有效性是有限的,以及亚高加索的腐烂。这些界限使得能够构建有效的信任间隔,并准确覆盖任何样本大小。我们通过将海灵格距离、斯坦差异和瓦塞尔斯坦距离与高斯人的距离紧密结合,形成了我们的不平等。作为副产品,我们获得了Hellinger CLT的第一个明确界限。