Speech recognition is largely taking advantage of deep learning, showing that substantial benefits can be obtained by modern Recurrent Neural Networks (RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which typically reach state-of-the-art performance in many tasks thanks to their ability to learn long-term dependencies and robustness to vanishing gradients. Nevertheless, LSTMs have a rather complex design with three multiplicative gates, that might impair their efficient implementation. An attempt to simplify LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just two multiplicative gates. This paper builds on these efforts by further revising GRUs and proposing a simplified architecture potentially more suitable for speech recognition. The contribution of this work is two-fold. First, we suggest to remove the reset gate in the GRU design, resulting in a more efficient single-gate architecture. Second, we propose to replace tanh with ReLU activations in the state update equations. Results show that, in our implementation, the revised architecture reduces the per-epoch training time with more than 30% and consistently improves recognition performance across different tasks, input features, and noisy conditions when compared to a standard GRU.
翻译:语音认知在很大程度上利用了深层次的学习,表明现代常务神经网络(神经网络)可以获得大量好处,最受欢迎的超时记忆(LSTMs)是长期短期记忆(LSTMs),由于他们有能力学习长期依赖性和坚固性,能够消失梯度,因此在很多任务中通常达到最先进的表现。然而,LSTMs有一个相当复杂的设计,有三种重复性的大门,可能会损害其有效的实施。最近,试图简化LSTMs的尝试导致Gated经常单位(GRUS),这些单位仅以两个多复制的大门为基础。本文以这些努力为基础,进一步修订GRUs和提出一个可能更适合语音识别的简化结构。这项工作的贡献是双重的。首先,我们建议取消GRU设计中的重新设置的大门,从而形成一个效率更高的单门结构。第二,我们提议在国家更新的方程式中用ReLU激活取代Th。结果显示,在我们的实施过程中,经过修改的架构会减少一线培训时间,有超过30%,并且不断提高GRU的不同标准性工作。