Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by $P,N\rightarrow\infty, P/N\sim O(1)$, where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks.
翻译:最近提议的 Ged 线性网络展示了一个可移植的非线性网络架构, 并展示了有趣的能力, 比如学习本地错误信号, 并减少连续学习中的遗忘。 在此工作中, 我们引入了一个新的 Gate Deep Linear 网络( GGDLNs), 名称为 Global Gate Deep Linear 网络( GGDLNs), 由每层的所有处理单位共享格子单位, 从而解开非线性但未获取的格子和所学的线性处理模式。 我们从这些网络的宽度限制中获取精确的方程式属性。 由 $P, N\rightrowrowarrowr\ int imfty, P/N\\\ sim O(1)$, 其中P和N是分别为培训样本和网络宽度( GDGDLN) 。 我们发现, 网络的统计数据可以通过一个基于模型的基质变矩阵的基质化模型, 以直径的基质化的基质化网络, 学习我们的基质的精度 。 我们的基质化网络的精度系统的精度分析, 将精度数据解的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 学习到精度的精度的精度的精度的精度的精度的精度, 。