In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information functions relate to each other. This is formally answered by Hu's theorem, which is widely known in the form of information diagrams: it relates disjoint unions of shapes in a Venn diagram to summation rules of information functions; this establishes a bridge from set theory to information theory. While a proof of this theorem is known, to date it was not analyzed in detail in what generality it could be established. In this work, we view random variables together with the joint operation as a monoid that acts by conditioning on information functions, and entropy as the unique function satisfying the chain rule of information. This allows us to abstract away from Shannon's theory and to prove a generalization of Hu's theorem, which applies to Shannon entropy of countably infinite discrete random variables, Kolmogorov complexity, Tsallis entropy, (Tsallis) Kullback-Leibler Divergence, cross-entropy, submodular information functions, and the generalization error in machine learning. Our result implies for Chaitin's prefix-free Kolmogorov complexity that the higher-order interaction complexities of all degrees are in expectation close to Shannon interaction information. For well-behaved probability distributions on increasing sequence lengths, this shows that asymptotically, the per-bit expected interaction complexity and information coincide, thus showing a strong bridge between algorithmic and classical information theory.
翻译:在信息理论中,一个主要目标是找到有用的功能,总结若干随机变量互动中所含信息的数量。 具体地说, 人们可以询问古典的香农变异体、 相互信息和更高互动信息功能是如何互相关联的。 由胡的理论正式回答, 以信息图表的形式广为人知: 它涉及到文恩图中各形状的脱节结合到对信息功能规则的比较; 这可以建立一个从定置理论到信息理论的连接。 虽然已经知道这个理论的长度, 但迄今为止, 它还没有详细分析它能够建立什么普遍性。 在这项工作中, 我们把随机变量和联合操作一起看成一个单项, 通过调整信息功能, 并把它作为满足信息链条规则的独特功能。 这使我们能够从香农的理论中抽取, 并证明胡的理论的概括性, 这适用于香农的无限离异随机变量, Kolmoporov 复杂度, Tsallis 密切度, (Tsallix) 一起将随机变量视为一个单项, 以调整为一种单项, 调的顺序排列顺序, 并显示我们一般的变数的变数规则的逻辑, 显示我们对结果的预期, 显示, 的预的预的逻辑, 显示, 和结果, 直判判值的预值, 显示我们对结果, 直判结果, 直判。