Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks. In this paper, we propose a novel method to do ILO regularized training differently. Instead of using conventional multitask methods that entail more training overhead, we directly make the intermediate layer output as input to the decoder, that is, our decoder not only accepts the output of the final encoder layer as input, it also takes the output of the encoder ILO as input during training. With the proposed method, as both encoder and decoder are simultaneously "regularized", the network is more sufficiently trained, consistently leading to improved results, over the ILO-based CTC method, as well as over the original attention-based modeling method without the proposed method employed.
翻译:通过对编码器方面进行多任务培训,使中间层产出(劳工组织)正规化,这已证明是在一系列端到端的ASR框架上取得更好的结果的有效办法,我们在本文件中提出了一种以不同方式进行劳工组织正规化培训的新办法,我们不使用常规的多任务方法,导致更多的培训间接费用,而是直接将中间层产出作为对解码器的投入,也就是说,我们的解码器不仅接受最后编码器层的产出作为投入,而且还将劳工组织编码器的产出作为培训期间的投入。由于编码器和解码器同时是“正规化”的,因此拟议的方法比劳工组织的CTC方法以及没有采用拟议方法的最初关注模式方法,对网络进行了更充分的培训,不断改进结果。