Neural architecture search (NAS) has attracted much attention and has been explored for automatic speech recognition (ASR). In this work, we focus on streaming ASR scenarios and propose the latency-controlled NAS for acoustic modeling. First, based on the vanilla neural architecture, normal cells are altered to causal cells to control the total latency of the architecture. Second, a revised operation space with a smaller receptive field is proposed to generate the final architecture with low latency. Extensive experiments show that: 1) Based on the proposed neural architecture, the neural networks with a medium latency of 550ms (millisecond) and a low latency of 190ms can be learned in the vanilla and revised operation space respectively. 2) For the low latency setting, the evaluation network can achieve more than 19\% (average on the four test sets) relative improvements compared with the hybrid CLDNN baseline, on a 10k-hour large-scale dataset.
翻译:神经结构搜索(NAS)吸引了很大关注,并被探索了自动语音识别(ASR) 。 在这项工作中,我们侧重于流传 ASR 情景,并提出了用于声学建模的悬浮控制NAS 。首先,根据香草神经结构,正常的细胞被改变为因果细胞,以控制建筑的总体悬浮。第二,建议使用一个较小可接收场的修改操作空间,以产生最后结构,且低静态。广泛实验显示:(1) 根据拟议的神经结构,在10公里的大型数据集上,具有550米中空和190米低悬浮的神经网络可以分别从香草和订正的操作空间中学习。(2) 对于低悬浮环境,评价网络可以实现超过19 ⁇ (平均在4个测试台上)的相对改善。