Bassino et al. 2010 and Regnier et al. 1998 showed the generating functions of the distributions of the number of the occurrences of words (distributions of words for short) in finite string in the form of rational functions. However the coefficients of the expansion of the rational functions are complicated and we do not have a simple formula of the exact distributions of words from rational functions. In this paper we study the finite dimensional generating functions of the distribution of nonoverlapping words for each fixed sample size and show the explicit formula of the distributions of words for Bernoulli model. We demonstrate that 1) the tests based on the distributions of words reject the random number generator in BSD Library with p-value almost zero and 2) computation of the distributions of words in the human DNA size strings.
翻译:Bassino等人(2010年)和Regnier等人(1998年)以理性功能的形式显示了有限字符串中字数(短字数的分布)的分布功能,但是,扩大合理函数的系数是复杂的,我们没有理性函数的准确分布的简单公式。在本文中,我们研究每个固定样本大小的非重叠字数分布的有限维度生成功能,并展示Bernoulli模型单词分布的清晰公式。我们证明,1)基于单词分布的测试拒绝BSD图书馆的随机数字生成器,其 p值几乎为零,2)计算人类DNA大小字符的分布。