The distributions of the number of occurrences of words (the distributions of words for short) play key roles in information theory, statistics, probability theory, ergodic theory, computer science, and DNA analysis. Bassino et al. 2010 and Regnier et al. 1998 showed generating functions of the distributions of words for all sample sizes. Robin et al. 1999 presented generating functions of the distributions for the return time of words and demonstrated a recurrence formula for these distributions. These generating functions are rational functions; except for simple cases, it is difficult to expand them into power series. In this paper, we study finite-dimensional generating functions of the distributions of nonoverlapping words for each fixed sample size and demonstrate the explicit formulae for the distributions of words for the Bernoulli models. Our results are generalized to nonoverlapping partial words. We study statistical tests that depend on the number of occurrences of words and the number of block-wise occurrences of words, respectively. We demonstrate that the power of the test that depends on the number of occurrences of words is significantly large compared to the other one. Finally, we apply our results to statistical tests for pseudo random numbers.
翻译:Bassino等人,2010年和Regnier等人,1998年显示所有样本大小的单词分布函数。Robin等人,1999年显示单词分布函数的返回时间,并显示这些分布的复发公式。这些生成功能是合理功能;除了简单案例外,很难将其扩展为权力序列。在本文中,我们研究每个固定样本大小的非重叠词分布的定量生成功能,并展示Bernoulli模型单词分布的清晰公式。我们的结果普遍化为不重叠部分单词。我们分别根据单词发生次数和单词出现次数来研究统计测试。我们证明,取决于单词发生次数的测试能力与另一个样本相比是巨大的。最后,我们把结果应用到随机数字的统计测试中。