Residual mappings have been shown to perform representation learning in the first layers and iterative feature refinement in higher layers. This interplay, combined with their stabilizing effect on the gradient norms, enables them to train very deep networks. In this paper, we take a step further and introduce entangled residual mappings to generalize the structure of the residual connections and evaluate their role in iterative learning representations. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings such as orthogonal, sparse, and structural correlation matrices that share key attributes (eigenvalues, structure, and Jacobian norm) with identity mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks differently than attention-based models and recurrent neural networks. In general, we find that for CNNs and Vision Transformers entangled sparse mapping can help generalization while orthogonal mappings hurt performance. For recurrent networks, orthogonal residual mappings form an inductive bias for time-variant sequences, which degrades accuracy on time-invariant tasks.
翻译:遗留物绘图显示在第一个层次进行代表性学习,并在更高层次进行迭接特征的完善。 这种相互作用,加上其对梯度规范的稳定效应,使得它们能够培训非常深的网络。 在本文中,我们进一步迈出一步,引入纠缠的残余物绘图,以概括剩余连接的结构,并评估其在迭接学习演示中的作用。 缠绕的残余物绘图可以取代身份跳过连接,代之以专门的纠缠映像,如垂直、稀疏和结构相关矩阵,这些映像与身份映像共有关键属性(电子价值、结构和雅各规范)。 我们表明,虽然串连的映像可以保存不同深层次模型的特征的迭接性完善,但它们对革命性网络中的代表性学习过程的影响不同于关注模型和经常性神经网络。 一般来说,我们发现,对于CNN和愿景变异器纠缠绕的稀有映像可以帮助概括性,而oroco映像会损害性能。 对于经常的网络, 或多位残余图像将形成时间变量序列的直观性偏差。