学习何时重正化？幂律谱动力学的充分条件 (When Does Learning Renormalize? Sufficient Conditions for Power Law Spectral Dynamics)

Empirical power--law scaling has been widely observed across modern deep learning systems, yet its theoretical origins and scope of validity remain incompletely understood. The Generalized Resolution--Shell Dynamics (GRSD) framework models learning as spectral energy transport across logarithmic resolution shells, providing a coarse--grained dynamical description of training. Within GRSD, power--law scaling corresponds to a particularly simple renormalized shell dynamics; however, such behavior is not automatic and requires additional structural properties of the learning process. In this work, we identify a set of sufficient conditions under which the GRSD shell dynamics admits a renormalizable coarse--grained description. These conditions constrain the learning configuration at multiple levels, including boundedness of gradient propagation in the computation graph, weak functional incoherence at initialization, controlled Jacobian evolution along training, and log--shift invariance of renormalized shell couplings. We further show that power--law scaling does not follow from renormalizability alone, but instead arises as a rigidity consequence: once log--shift invariance is combined with the intrinsic time--rescaling covariance of gradient flow, the renormalized GRSD velocity field is forced into a power--law form.

翻译：经验幂律标度在现代深度学习系统中被广泛观测到，但其理论起源与有效范围仍未被完全理解。广义分辨率壳层动力学框架将学习建模为跨越对数分辨率壳层的谱能量输运，为训练提供了粗粒化的动力学描述。在广义分辨率壳层动力学框架内，幂律标度对应一种特别简单的重正化壳层动力学；然而，这种行为并非自动产生，需要学习过程具备额外的结构特性。在本工作中，我们识别了一组充分条件，在这些条件下，广义分辨率壳层动力学的壳层动力学允许一个可重正化的粗粒化描述。这些条件在多个层面上约束了学习配置，包括计算图中梯度传播的有界性、初始化时的弱函数非相干性、训练过程中雅可比矩阵的受控演化，以及重正化壳层耦合的对数平移不变性。我们进一步证明，幂律标度并非仅源于可重正化性，而是作为一种刚性结果出现：一旦将对数平移不变性与梯度流固有的时间重标度协变性相结合，重正化后的广义分辨率壳层动力学速度场就被迫呈现幂律形式。