The need for automatic design of deep neural networks has led to the emergence of neural architecture search (NAS), which has generated models outperforming manually-designed models. However, most existing NAS frameworks are designed for image processing tasks, and lack structures and operations effective for voice activity detection (VAD) tasks. To discover improved VAD models through automatic design, we present the first work that proposes a NAS framework optimized for the VAD task. The proposed NAS-VAD framework expands the existing search space with the attention mechanism while incorporating the compact macro-architecture with fewer cells. The experimental results show that the models discovered by NAS-VAD outperform the existing manually-designed VAD models in various synthetic and real-world datasets. Our code and models are available at https://github.com/daniel03c1/NAS_VAD.
翻译:由于需要自动设计深神经网络,因此出现了神经结构搜索(NAS),生成模型的模型优于人工设计的模型,然而,大多数现有的NAS框架是为图像处理任务设计的,缺乏对语音活动探测任务有效的结构和操作。为了通过自动设计发现经改进的VAD模型,我们介绍了为VAD任务优化的NAS框架的第一个工作。拟议的NAS-VAD框架将现有的搜索空间扩大为关注机制,同时将紧凑的大型结构与较少的细胞结合起来。实验结果显示,NAS-VAD发现的模型超越了各种合成和真实世界数据集中现有的手工设计的VAD模型。我们的代码和模型可在https://github.com/daniel03c1/NAS_VAD上查阅。