This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.
翻译:本文提出一种新的正规化算法, 称为“ 宏观区块辍学” 。 在培训大型神经网络模型时, 过度适应问题是一个困难的问题。 事实证明, 辍学技术通过防止培训期间复杂的相互适应, 对正规化来说是简单而非常有效的。 在我们的工作中, 我们定义了一个宏观区块, 它包含大量从输入输入到经常性神经网络( RNN) 的单位。 我们不是对每个单元适用随机辍学法, 而是对每个单元适用随机退出法。 这个算法的效果是, 对每一层适用不同的平均辍学率, 即使我们保持一个不变的平均辍学率, 从而产生更好的正规化效果。 在我们使用常态神经网络- 传输器( RNNN- T) 的实验中, 这个算法显示, 相对于LibriSpeech 测试- 清洁和测试- 其它常规的辍学率而言, 相对而言, 430 % 和 6.13 % 字错误率(WERs) 的改进了。 。 由于以注意为基础的 Ecoder- Decer (AED) 模型, 这一算法显示, 相对4.36 和 5.85 WERs 改进了同一测试中常规辍学率。