We introduce ImportantAug, a technique to augment training data for speech classification and recognition models by adding noise to unimportant regions of the speech and not to important regions. Importance is predicted for each utterance by a data augmentation agent that is trained to maximize the amount of noise it adds while minimizing its impact on recognition performance. The effectiveness of our method is illustrated on version two of the Google Speech Commands (GSC) dataset. On the standard GSC test set, it achieves a 23.3% relative error rate reduction compared to conventional noise augmentation which applies noise to speech without regard to where it might be most effective. It also provides a 25.4% error rate reduction compared to a baseline without data augmentation. Additionally, the proposed ImportantAug outperforms the conventional noise augmentation and the baseline on two test sets with additional noise added.
翻译:我们引入了一种技术,即通过在演讲的无关紧要区域而不是在重要区域增加噪音来增加语音分类和识别模型的培训数据; 由受过培训的数据增强剂对每个发音的重要性作出预测,该增强剂将最大限度地增加噪音,同时尽量减少其对认知性效果的影响; 我们的方法的有效性在谷歌语音指令数据集第二版中作了说明; 在标准全球之声测试集中,它实现了23.3%的相对误差率降低,而常规噪音增强则在语音中应用噪音,而没有考虑到它可能最有效的地方; 与基线相比,它也提供了25.4%的误差率降低率,而没有增加数据的基线。 此外,拟议的SignAug对常规噪音增强和两个测试组的基线都增加了更多的噪音。