Donoho and Kipnis (2022) showed that the the higher criticism (HC) test statistic has a non-Gaussian phase transition but remarked that it is probably not optimal, in the detection of sparse differences between two large frequency tables when the counts are low. The setting can be considered to be heterogeneous, with cells containing larger total counts more able to detect smaller differences. We provide a general study here of sparse detection arising from such heterogeneous settings, and showed that optimality of the HC test statistic requires thresholding, for example in the case of frequency table comparison, to restrict to p-values of cells with total counts exceeding a threshold. The use of thresholding also leads to optimality of the HC test statistic when it is applied on the sparse Poisson means model of Arias-Castro and Wang (2015). The phase transitions we consider here are non-Gaussian, and involve an interplay between the rate functions of the response and sample size distributions. We also showed, both theoretically and in a numerical study, that applying thresholding to the Bonferroni test statistic results in better sparse mixture detection in heterogeneous settings.
翻译:暂无翻译