In this paper we consider the problem of uniformity testing with limited memory. We observe a sequence of independent identically distributed random variables drawn from a distribution $p$ over $[n]$, which is either uniform or is $\varepsilon$-far from uniform under the total variation distance, and our goal is to determine the correct hypothesis. At each time point we are allowed to update the state of a finite-memory machine with $S$ states, where each state of the machine is assigned one of the hypotheses, and we are interested in obtaining an asymptotic probability of error at most $0<\delta<1/2$ uniformly under both hypotheses. The main contribution of this paper is deriving upper and lower bounds on the number of states $S$ needed in order to achieve a constant error probability $\delta$, as a function of $n$ and $\varepsilon$, where our upper bound is $O(\frac{n\log n}{\varepsilon})$ and our lower bound is $\Omega (n+\frac{1}{\varepsilon})$. Prior works in the field have almost exclusively used collision counting for upper bounds, and the Paninski mixture for lower bounds. Somewhat surprisingly, in the limited memory with unlimited samples setup, the optimal solution does not involve counting collisions, and the Paninski prior is not hard. Thus, different proof techniques are needed in order to attain our bounds.
翻译:在本文中,我们用有限的内存来考虑统一测试问题。 我们观察的是一系列单独分布的随机变数序列,这些变数来自以$$[$]以上分配的美元,这些变数要么是统一的,要么是瓦列普西隆美元,要么是在总变差距离下从制服中划出的,而我们的目标是确定正确的假设。 每次允许我们用美元来更新一个有固定模量的机器的状态,每台机器的每个状态都被分配一个假说,我们有兴趣在两种假说下都得到一个无症状的误差概率。 本文的主要贡献是,为了实现一个不变的误差率,需要美元和瓦列普西隆美元,我们的上限值是美元(frac{n\log nurepslon},我们较低的误差概率是 $(n ⁇ delta < 1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\