Many practical tasks involve sampling sequentially without replacement (WoR) from a finite population of size $N$, in an attempt to estimate some parameter $\theta^\star$. Accurately quantifying uncertainty throughout this process is a nontrivial task, but is necessary because it often determines when we stop collecting samples and confidently report a result. We present a suite of tools for designing \textit{confidence sequences} (CS) for $\theta^\star$. A CS is a sequence of confidence sets $(C_n)_{n=1}^N$, that shrink in size, and all contain $\theta^\star$ simultaneously with high probability. We present a generic approach to constructing a frequentist CS using Bayesian tools, based on the fact that the ratio of a prior to the posterior at the ground truth is a martingale. We then present Hoeffding- and empirical-Bernstein-type time-uniform CSs and fixed-time confidence intervals for sampling WoR, which improve on previous bounds in the literature and explicitly quantify the benefit of WoR sampling.
翻译:许多实际任务涉及连续取样,而不从一定规模的美元中替换(WoR),以试图估算某些参数$\theta ⁇ star$。准确量化整个过程中的不确定性是一项非三重任务,但之所以有必要,是因为它常常决定我们何时停止采集样本,并有信心地报告结果。我们为$theta ⁇ star$提供了一套设计\text{star 序列的工具。一个 CS是一系列信任($C_n=1N$)的序列,其规模缩小,全部同时包含$\theta ⁇ star$,而且概率很高。我们提出了一个使用巴耶斯工具构建常客式 CS的通用方法,其依据是,在地面真理的远地点之前的一个地点,其比率是martingale。然后我们提出一个用于取样WoR的Hoffding-和实证-Bernstein型时间统一 CS和固定时间信任度的间隔,该方法在文献中改进了以前的界限,并明确量化了WoR取样的好处。