Neural Architecture Search (NAS), the process of automating architecture engineering, is an appealing next step to advancing end-to-end Automatic Speech Recognition (ASR), replacing expert-designed networks with learned, task-specific architectures. In contrast to early computational-demanding NAS methods, recent gradient-based NAS methods, e.g., DARTS (Differentiable ARchiTecture Search), SNAS (Stochastic NAS) and ProxylessNAS, significantly improve the NAS efficiency. In this paper, we make two contributions. First, we rigorously develop an efficient NAS method via Straight-Through (ST) gradients, called ST-NAS. Basically, ST-NAS uses the loss from SNAS but uses ST to back-propagate gradients through discrete variables to optimize the loss, which is not revealed in ProxylessNAS. Using ST gradients to support sub-graph sampling is a core element to achieve efficient NAS beyond DARTS and SNAS. Second, we successfully apply ST-NAS to end-to-end ASR. Experiments over the widely benchmarked 80-hour WSJ and 300-hour Switchboard datasets show that the ST-NAS induced architectures significantly outperform the human-designed architecture across the two datasets. Strengths of ST-NAS such as architecture transferability and low computation cost in memory and time are also reported.
翻译:建筑建筑工程自动化的神经结构搜索(NAS)进程是推进端到端自动语音识别(ASR)的下一步,是推动端到端自动语音识别(ASR)的下一步,以学习的、任务特定的建筑取代专家设计的网络。与早期计算需求型NAS方法、最近基于梯度的NAS方法(例如,不同可变ARchi构造搜索)、SNAS(Stochestic NAS)和无氧NAS(ProxiveNAS)相比,这是推动端到端自动语音识别(ASR)的下一步。首先,我们严格开发一种高效的NAS方法,通过直流(ST-NAS)梯度(ST-NAS)取代专家设计的网络。基本上,ST-NAS使用早期计算损失模式的亏损,但使用基于梯度的梯度调整梯度,以优化损失,这在ProxlessNAS(S)中是没有披露的。使用ST-NAS(S)梯度支持子取样的核心要素,以实现达目的NAS(ST-NA) ST-NAS)最终到端端至端梯值的ASR(S)结构。我们成功地将ST-sh(S-sh-strax-sh-straxxxxxx)的80时标定值结构的80-tra Stal Stal Stal Stal Stal-s-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-ta-tabiltal 标为80)。