We present an impossibility result, called a theorem about facts and words, which pertains to a general communication system. The theorem states that the number of distinct words used in a finite text is roughly greater than the number of independent elementary persistent facts described in the same text. In particular, this theorem can be related to Zipf's law, power-law scaling of mutual information, and power-law-tailed learning curves. The assumptions of the theorem are: a finite alphabet, linear sequence of symbols, complexity that does not decrease in time, entropy rate that can be estimated, and finiteness of the inverse complexity rate.
翻译:我们提出了一个不可能的结果,称为关于事实和文字的理论,它与一般通信系统有关。理论指出,有限文本中使用的不同词的数量大约大于同一文本中描述的独立的基本持久事实的数量。特别是,这一理论可能与齐普夫的法律、相互信息的权力-法律尺度和权力-法律-法定的学习曲线有关。理论的假设是:一个固定的字母、符号的线性序列、不减少时间的复杂程度、可以估计的通缩率以及反向复杂程度的有限性。