In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directly on the spectrogram of input utterances, which does not require any alignment. We demonstrate that TrimTail is computationally cheap and can be applied online and optimized with any training loss or any model architecture on any dataset without any extra effort by applying it on various end-to-end streaming ASR networks either trained with CTC loss [1] or Transducer loss [2]. We achieve 100 $\sim$ 200ms latency reduction with equal or even better accuracy on both Aishell-1 and Librispeech. Moreover, by using TrimTail, we can achieve a 400ms algorithmic improvement of User Sensitive Delay (USD) with an accuracy loss of less than 0.2.
翻译:本文介绍TrimTail(TrimTail),这是一个简单而有效的排放规范化方法,用于改善流动 ASR 模型的延缓度。 TrimTail 的核心思想是直接对输入语句的光谱图进行长度处罚(即通过剪裁跟踪框架,见Fig.1-(b)),这不需要任何校正。我们证明TrimTail是计算便宜的,可以在网上应用,并且可以在任何数据集上以任何培训损失或任何模型结构进行优化,而无需付出任何额外的努力,在各种终端到终端流动 ASR 网络中应用该方法,或者在以 CCT 损失 [1 或 Transduker损失 [2] 培训的网络中应用该方法。我们在Aishell-1 和 Librispeech 上都实现了100 $simm $ 200 ms latency reduction,同时在 Aishell-1 和 Librispeech 上都实现400 mus logical squlational sabis gread salution of Unial relat (US)