This paper explores the relationship between the condition number of a neural network's weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. It argues that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. This intuition is formalized for linear units with Gaussian inputs, linking the condition number and the transformation's log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. The analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy. Furthermore, the linear stage entropy bound provides an upper limit on post-activation information for contractive, element-wise nonlinearities, supporting the condition number as a scale-invariant proxy for encoding capacity in practical neural networks. An empirical case study applies these principles to guide selective fine-tuning of Large Language Models for both a new task and a new input modality. The experiments show that the proposed method, named KappaTune, effectively mitigates catastrophic forgetting. Unlike many existing catastrophic forgetting mitigation methods that rely on access to pre-training statistics, which are often unavailable, this selective fine-tuning approach offers a way to bypass this common requirement.
翻译:本文从信息论视角探讨了神经网络权重张量的条件数与相应处理单元所编码信息程度之间的关系。文章指出,高条件数虽不足以确保有效的知识编码,但可能表明该单元已学会选择性地放大和压缩信息。针对具有高斯输入的线性单元,这一直觉被形式化地表述为:条件数与变换的对数体积缩放因子共同关联着输出熵的特性及所学变换的几何性质。分析表明,在固定权重范数的情况下,奇异值的集中分布(高条件数)对应着整体信息传递的减少,这标志着一种专业化且高效的编码策略。此外,线性阶段熵界为压缩性逐元素非线性提供了激活后信息的上限,从而支持将条件数作为实际神经网络编码能力的尺度不变代理指标。一项实证案例研究应用这些原理指导大型语言模型针对新任务和新输入模态的选择性微调。实验表明,所提出的名为KappaTune的方法能有效缓解灾难性遗忘。与许多依赖预训练统计数据的现有灾难性遗忘缓解方法(此类数据往往难以获取)不同,这种选择性微调方法提供了一条绕过这一常见要求的途径。