Distribution testing can be described as follows: $q$ samples are being drawn from some unknown distribution $P$ over a known domain $[n]$. After the sampling process, a decision must be made about whether $P$ holds some property, or is far from it. The most studied problem in the field is arguably uniformity testing, where one needs to distinguish the case that $P$ is uniform over $[n]$ from the case that $P$ is $\epsilon$-far from being uniform (in $\ell_1$). In the classic model, it is known that $\Theta\left(\sqrt{n}/\epsilon^2\right)$ samples are necessary and sufficient for this task. This problem was recently considered in various restricted models that pose, for example, communication or memory constraints. In more than one occasion, the known optimal solution boils down to counting collisions among the drawn samples (each two samples that have the same value add one to the count), an idea that dates back to the first uniformity tester, and was coined the name "collision-based tester". In this paper, we introduce the notion of comparison graphs and use it to formally define a generalized collision-based tester. Roughly speaking, the edges of the graph indicate the tester which pairs of samples should be compared (that is, the original tester is induced by a clique, where all pairs are being compared). We prove a structural theorem that gives a sufficient condition for a comparison graph to induce a good uniformity tester. As an application, we develop a generic method to test uniformity, and devise nearly-optimal uniformity testers under various computational constraints. We improve and simplify a few known results, and introduce a new constrained model in which the method also produces an efficient tester. The idea behind our method is to translate computational constraints of a certain model to ones on the comparison graph, which paves the way to finding a good graph.
翻译:分配测试可以描述如下: $q$ 样本来自某种未知的单一性分配 $P$ 。 取样过程结束后, 必须决定美元是否持有某些属性, 或远于此。 实地研究最多的问题可能是统一性测试, 其中人们需要区分美元与$[ 美元一致的情况, 美元与美元相仿的情况相区别( $\ epsilon$- far) 。 在经典模型中, 已知的是$ Theta\ releft (\ sqrt{n}/\ epsilon_ 2\right) 美元样本是否为此任务所必要和足够。 最近, 各种限制型模型都考虑了这个问题, 例如, 通信或记忆限制。 在不止一次的情况下, 已知的最佳解决方案会归到计算所抽取的样品之间的碰撞( 每两个样本, 其模型都具有相同的价值, 再加到数字, 一种想法可以追溯到第一个统一性测试者, 并且将名称“ colliision- levely livers a rior rior rior deal tor or or or or deal or deal deal deal deal dress.