Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed architectures. NAS can derive a SOTA and data-specific architecture over validation data from a pre-defined search space with a search algorithm. Inspired by the success of NAS in ASR tasks, we propose a NAS-based ASR framework containing one search space and one differentiable search algorithm called Differentiable Architecture Search(DARTS). Our search space follows the convolution-augmented transformer(Conformer) backbone, which is a more expressive ASR architecture than those used in existing NAS-based ASR frameworks. To improve the performance of our method, a regulation method called Dynamic Search Schedule(DSS) is employed. On a widely used Mandarin benchmark AISHELL-1, our best-searched architecture outperforms the baseline Conform model significantly with about 11% CER relative improvement, and our method is proved to be pretty efficient by the search cost comparisons.
翻译:最近,神经结构搜索(NAS)成功地用于图像分类、自然语言处理和自动语音识别(ASR)任务,以寻找比人类设计的建筑更先进的艺术(SOTA)结构。NAS可以从一个预先确定的搜索空间中用搜索算法得出SOTA和数据专用结构来取代验证数据。在NAS成功完成ASR任务的启发下,我们提出了一个基于NAS的ASR框架,其中包括一个搜索空间和一个不同的搜索算法,称为差异建筑搜索(DARTS)。我们的搜索空间遵循的是一个革命强化变异器(Coned)主干线,这个变异器比现有的NAS ASR框架使用的更显眼的ASR结构。为了改进我们的方法的性能,采用了一种称为动态搜索计划(DSS)的监管方法。在一种广泛使用的曼达林基准AISHELL-1上,我们的最佳搜索结构比基线 Conform模型明显地比了大约11%的CER相对改进,我们的方法被证明是相当有效的。