In this paper we consider the uniformity testing problem for high-dimensional discrete distributions (multinomials) under sparse alternatives. More precisely, we derive sharp detection thresholds for testing, based on $n$ samples, whether a discrete distribution supported on $d$ elements differs from the uniform distribution only in $s$ (out of the $d$) coordinates and is $\varepsilon$-far (in total variation distance) from uniformity. Our results reveal various interesting phase transitions which depend on the interplay of the sample size $n$ and the signal strength $\varepsilon$ with the dimension $d$ and the sparsity level $s$. For instance, if the sample size is less than a threshold (which depends on $d$ and $s$), then all tests are asymptotically powerless, irrespective of the magnitude of the signal strength. On the other hand, if the sample size is above the threshold, then the detection boundary undergoes a further phase transition depending on the signal strength. Here, a $\chi^2$-type test attains the detection boundary in the dense regime, whereas in the sparse regime a Bonferroni correction of two maximum-type tests and a version of the Higher Criticism test is optimal up to sharp constants. These results combined provide a complete description of the phase diagram for the sparse uniformity testing problem across all regimes of the parameters $n$, $d$, and $s$. One of the challenges in dealing with multinomials is that the parameters are always constrained to lie in the simplex. This results in the aforementioned two-layered phase transition, a new phenomenon which does not arise in classical high-dimensional sparse testing problems.
翻译:在本文中,我们考虑在稀少的替代品下对高维离散分布(多元体)进行统一测试的问题。更准确地说,我们根据美元样本得出测试的敏锐检测阈值,如果以美元元素支持的离散分布值与仅以美元(美元美元)坐标表示的统一分布值不同,那么所有测试都与仅以美元(美元美元)坐标表示的统一分布值不同,并且是美元-远方(完全变异距离)与统一度(美元)相异)。我们的结果表明,不同阶段的过渡取决于抽样规模(美元)和信号强度(美元)的相互作用。更精确度(美元)和松散度水平(美元)的测试值。例如,如果抽样规模低于一个阈值(美元和美元),那么所有测试都是暂时的,无论信号强度有多大。另一方面,如果抽样大小高于临界值,则检测边界会随着信号强度的强度而发生进一步阶段过渡。在这里,在最稠密的货币交易体系中,需要2美元的参数测试边界(美元-美元-美元),而在最短的等级测试中,最短的阶段,最短的测试阶段是C级测试结果将产生。