Concentration inequalities for the sample mean, like those due to Bernstein and Hoeffding, are valid for any sample size but overly conservative, yielding confidence intervals that are unnecessarily wide. The central limit theorem (CLT) provides asymptotic confidence intervals with optimal width, but these are invalid for all sample sizes. To resolve this tension, we develop new computable concentration inequalities with asymptotically optimal size, finite-sample validity, and sub-Gaussian decay. These bounds enable the construction of efficient confidence intervals with correct coverage for any sample size. We derive our inequalities by tightly bounding the Hellinger distance, Stein discrepancy, non-uniform Kolmogorov distance, and Wasserstein distance to a Gaussian, and, as a byproduct, we obtain the first explicit bounds for the Hellinger CLT.
翻译:样本平均值的浓度不平等,如伯恩斯坦和霍夫丁造成的浓度不平等,对于任何样本规模都有效,但过于保守,产生不必要的宽度的信任间隔。中央限值(CLT)提供了最佳宽度的无线信任间隔,但对于所有样本大小都是无效的。为了解决这种紧张,我们开发了新的可计算浓度不平等,其规模在时间上是非最佳的,其有效性是有限的,以及亚加利逊的衰变。这些界限使得能够构建高效的互信间隔,并准确覆盖任何样本大小。我们通过紧密连接海灵格距离、斯坦差异、非统一的科尔莫戈夫距离和瓦瑟斯坦与高山的距离,以及作为副产品,我们获得了海灵格CLT的第一个明确界限。