基于经验熵归一化的二进制词调整柯尔莫哥洛夫复杂度 (Adjusted Kolmogorov Complexity of Binary Words with Empirical Entropy Normalization)

Kolmogorov complexity of a finite binary word reflects both algorithmic structure and the empirical distribution of symbols appearing in the word. Words with symbol frequencies far from one half have smaller combinatorial richness and therefore appear less complex under the standard definition. In this paper an entropy-normalized complexity measure is introduced that divides the Kolmogorov complexity of a word by the empirical entropy of its observed distribution of zeros and ones. This adjustment isolates intrinsic descriptive complexity from the purely combinatorial effect of symbol imbalance. For Martin Löf random sequences under constructive exchangeable measures, the adjusted complexity grows linearly and converges to one. A pathological construction shows that regularity of the underlying measure is essential. The proposed framework connects Kolmogorov complexity, empirical entropy, and randomness in a natural manner and suggests applications in randomness testing and in the analysis of structured binary data.

翻译：有限二进制词的柯尔莫哥洛夫复杂度同时反映了算法结构和词中符号的经验分布。符号频率远离二分之一的词具有较小的组合丰富性，因此在标准定义下显得复杂度较低。本文引入了一种熵归一化的复杂度度量，它将一个词的柯尔莫哥洛夫复杂度除以其观测到的0和1分布的经验熵。这种调整将内在的描述复杂度与符号不平衡的纯组合效应分离开来。对于在构造可交换测度下的马丁-洛夫随机序列，调整后的复杂度线性增长并收敛于1。一个病理性构造表明，底层测度的正则性是至关重要的。所提出的框架以自然的方式将柯尔莫哥洛夫复杂度、经验熵和随机性联系起来，并建议了在随机性测试和结构化二进制数据分析中的应用。