Motivated by applications in text mining and discrete distribution inference, we investigate the testing for equality of probability mass functions of $K$ groups of high-dimensional multinomial distributions. A test statistic, which is shown to have an asymptotic standard normal distribution under the null, is proposed. The optimal detection boundary is established, and the proposed test is shown to achieve this optimal detection boundary across the entire parameter space of interest. The proposed method is demonstrated in simulation studies and applied to analyze two real-world datasets to examine variation among consumer reviews of Amazon movies and diversity of statistical paper abstracts.
翻译:我们根据在文字开采和分散分布推论方面的应用,调查了高维多元分布群的概率质量功能是否相等的测试,提出了试验统计,显示该统计在无效物下具有无症状标准标准正常分布,确定了最佳探测边界,并展示了拟议的试验,以在整个感兴趣的参数空间达到最佳探测边界。提议的方法在模拟研究中展示,用于分析两个真实世界的数据集,以审查消费者对亚马逊电影的审查结果和统计论文摘要的多样性之间的差异。