We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.
翻译:我们建议对LSTM进行重新校准,将分批正常化的好处带给经常性神经网络。虽然以前的工程只对RNNs输入到隐藏的转换适用分批正常化,但我们证明,分批统一隐藏到隐藏的过渡既有可能,也是有益的,从而减少时间步骤之间的内部共变变化。我们评估了我们关于序列分类、语言建模和回答问题等一系列相继问题的建议。我们的经验结果表明,分批调整的LSTM不断导致更快的趋同和改进的概括化。