Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.
翻译:数据增强是一种无处不在的技术,用于为自动语音识别(ASR)培训提供稳健性能。然而,即使许多ASR培训过程已经自动化,而且更加“端到端”化,数据增强政策(什么增强功能可以使用,以及如何应用这些功能)仍然是手工制作的。我们展示了图形增强法,这是一种将扩增空间定义为定向自行车图(DAGs)的技术,并在此空间进行搜索,以优化扩增政策本身。我们显示,根据同样的计算预算,G-Augment制定的政策能够比通过随机搜索关于CHime-6和AMI的微调任务而获得的分解政策更好。G-Augment还能够在CHiME-6评价集(30.7%的WER)上确立新的亚SR状态性能。我们进一步证明,G-Augment政策显示,在热源培训到冷点启动培训和模型尺寸方面,比随机搜索的SpecAugment政策显示更好的转移特性。