Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we derive a perturbation theory for the IB method and report the first complete characterization of the learning onset, the limit of maximum relevant information per bit extracted from data. We test our results on synthetic probability distributions, finding good agreement with the exact numerical solution near the onset of learning. We explore the difference and subtleties in our derivation and previous attempts at deriving a perturbation theory for the learning onset and attribute the discrepancy to a flawed assumption. Our work also provides a fresh perspective on the intimate relationship between the IB method and the strong data processing inequality.
翻译:从数据中提取相关信息对各种形式的学习都至关重要。 信息瓶颈(IB)方法将这一点正式化,为理解学习现象提供了一个数学精确和概念上有吸引力的框架。 然而,IB问题的不线性使得它计算成本昂贵,分析上难以处理。 我们在这里为IB方法得出一个扰动理论,并报告第一次完整的学习起点特征描述,即从数据中提取的每百位最大相关信息的限度。 我们测试了我们合成概率分布的结果,在学习开始时找到与精确数字解决方案的一致。 我们探索了我们衍生的差别和微妙之处,以及以前试图为学习起点得出一个扰动理论的尝试,并将差异归因于一个错误的假设。 我们的工作还就IB方法与数据处理高度不平等之间的亲密关系提供了新的视角。