Data augmentation is a technique to generate new training data based on existing data. We evaluate the simple and cost-effective method of concatenating the original data examples to build new training instances. Continued training with such augmented data is able to improve off-the-shelf Transformer and Conformer models that were optimized on the original data only. We demonstrate considerable improvements on the LibriSpeech-960h test sets (WER 2.83 and 6.87 for test-clean and test-other), which carry over to models combined with shallow fusion (WER 2.55 and 6.27). Our method of continued training also leads to improvements of up to 0.9 WER on the ASR part of CoVoST-2 for four non English languages, and we observe that the gains are highly dependent on the size of the original training data. We compare different concatenation strategies and found that our method does not need speaker information to achieve its improvements. Finally, we demonstrate on two datasets that our methods also works for speech translation tasks.
翻译:增强数据是在现有数据的基础上产生新的培训数据的一种技术。我们评估将原始数据实例集中起来以建立新的培训实例的简单和成本效益高的方法。继续以这种增强的数据培训能够改进仅以原始数据优化的现成变异器和变异器模型。我们比较了不同的组合战略,发现我们的方法不需要语言信息来加以改进。最后,我们用两个数据集来证明,我们的方法也可用于语言翻译任务。