Over past few years afterward the birth of ResNet, skip connection has become the defacto standard for the design of modern architectures due to its widespread adoption, easy optimization and proven performance. Prior work has explained the effectiveness of the skip connection mechanism from different perspectives. In this work, we deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. However, while a model is explained as a Markov chain, it is not guaranteed to be optimized following an efficient Markov chain by existing SGD-based optimizers which are prone to get trapped in local optimal points. In order to towards a more efficient Markov chain, we propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain. Aside from that, the penal connection can also be viewed as a particular model regularization and can be easily implemented with one line of code in the most popular deep learning frameworks~\footnote{Source code: \url{https://github.com/densechen/penal-connection}}. The encouraging experimental results in multi-modal translation and image recognition empirically confirm our conjecture of the learnable Markov chain view and demonstrate the superiority of the proposed penal connection.
翻译:在ResNet诞生后的过去几年里,由于广泛采用、简单优化和经过验证的性能,跳线连接已成为设计现代结构的设计的公开标准。先前的工作从不同角度解释了跳线连接机制的有效性。在这项工作中,我们深潜到模型的行为中,跳过连接可以形成一个可学习的Markov链条。一个高效的Markov链条更可取,因为它总是以更好的方式绘制输入数据到目标域的图象。但是,虽然一个模型被解释为Markov链条,但不能保证现有基于SGD的优化剂能够按照高效的Markov链条优化。为了实现效率更高的Markov链条,我们建议一种简单的刑事连接例行做法,使任何剩余式的连接成为可学习的Markov链条。除此之外,刑事联系也可以被视为一种特殊的模式规范化,并且可以在最受欢迎的深层次学习框架中以一行代码来实施。源代码:http://url{https://githubbb.comnov/penal recolview imal conview the mainal constalimalimal reclation/calmentalmentalmentalmentalmentalizalmentalization.</s>