Uniformity testing is one of the most well-studied problems in property testing, with many known test statistics, including ones based on counting collisions, singletons, and the empirical TV distance. It is known that the optimal sample complexity to distinguish the uniform distribution on $m$ elements from any $\epsilon$-far distribution with $1-\delta$ probability is $n = \Theta\left(\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2} + \frac{\log (1/\delta)}{\epsilon^2}\right)$, which is achieved by the empirical TV tester. Yet in simulation, these theoretical analyses are misleading: in many cases, they do not correctly rank order the performance of existing testers, even in an asymptotic regime of all parameters tending to $0$ or $\infty$. We explain this discrepancy by studying the \emph{constant factors} required by the algorithms. We show that the collisions tester achieves a sharp maximal constant in the number of standard deviations of separation between uniform and non-uniform inputs. We then introduce a new tester based on the Huber loss, and show that it not only matches this separation, but also has tails corresponding to a Gaussian with this separation. This leads to a sample complexity of $(1 + o(1))\frac{\sqrt{m \log (1/\delta)}}{\epsilon^2}$ in the regime where this term is dominant, unlike all other existing testers.
翻译:统一度测试是财产测试中研究最深的问题之一, 有许多已知的测试统计数据, 包括基于计算碰撞、 单吨和实证电视距离的测试数据。 众所周知, 将美元元素的统一分布与美元- delta美元分配的1美元/ delta美元概率的美元/ 美元/ delta美元分配区分的最佳样本复杂性是 $ = = 美元= 美元/ left( form) = = 美元/ left (= delta) { = epsilon2} +\ fracxlog (1/ delta) (1/\ delta) = level) =- levelxxxx $( ) 。 然而在模拟中, 这些理论分析是误导的: 在许多情况下, 它们并不正确排序现有测试器的性能, 在所有参数的无症状系统中, = $0 = = = = = = = = = = = orxxxxxx xx x x xxx 的当前 标准 x xxxxx xxxxxxxxxxx x x x x 。