Neural architecture search (NAS) has been successfully applied to tasks like image classification and language modeling for finding efficient high-performance network architectures. In ASR field especially end-to-end ASR, the related research is still in its infancy. In this work, we focus on applying NAS on the most popular manually designed model: Conformer, and then propose an efficient ASR model searching method that benefits from the natural advantage of differentiable architecture search (Darts) in reducing computational overheads. We fuse Darts mutator and Conformer blocks to form a complete search space, within which a modified architecture called Darts-Conformer cell is found automatically. The entire searching process on AISHELL-1 dataset costs only 0.7 GPU days. Replacing the Conformer encoder by stacking searched cell, we get an end-to-end ASR model (named as Darts-Conformner) that outperforms the Conformer baseline by 4.7\% on the open-source AISHELL-1 dataset. Besides, we verify the transferability of the architecture searched on a small dataset to a larger 2k-hour dataset. To the best of our knowledge, this is the first successful attempt to apply gradient-based architecture search in the attention-based encoder-decoder ASR model.
翻译:神经结构搜索(NAS) 成功应用到图像分类和语言模型等任务中, 以寻找高效高性能网络架构。 在 ASR 字段中, 特别是端到端的 ASR, 相关研究仍处于初级阶段。 在这项工作中, 我们侧重于将NAS应用到最受欢迎的人工设计模型上: Confer, 然后提出高效的 ASR 模型搜索方法, 受益于不同建筑搜索( Darts) 的自然优势, 以减少计算管理费用。 我们结合了 Darts 突变器和连接区块, 以形成完整的搜索空间, 在其中自动找到一个名为 Darts- Confred 的修改结构。 AISHELL-1 数据集的整个搜索过程仅花费0. 0. 7 GPU 日。 通过堆放搜索单元格来替换 Connect 编码编码, 我们得到了一个终端到端的 ASR 模型模型( 以 Darts- Conformorner ) 的自然优势, 在基于 开源 ASHELL-1 的数据集上, 校验小数据集中, 我们所搜索的架构中搜索的架构结构的最佳可转移性, 正在将我们搜索二小时的搜索数据库应用中, 成功搜索数据库数据设置。