Acoustic Event Classification (AEC) has been widely used in devices such as smart speakers and mobile phones for home safety or accessibility support. As AEC models run on more and more devices with diverse computation resource constraints, it became increasingly expensive to develop models that are tuned to achieve optimal accuracy/computation trade-off for each given computation resource constraint. In this paper, we introduce a Once-For-All (OFA) Neural Architecture Search (NAS) framework for AEC. Specifically, we first train a weight-sharing supernet that supports different model architectures, followed by automatically searching for a model given specific computational resource constraints. Our experimental results showed that by just training once, the resulting model from NAS significantly outperforms both models trained individually from scratch and knowledge distillation (25.4% and 7.3% relative improvement). We also found that the benefit of weight-sharing supernet training of ultra-small models comes not only from searching but from optimization.
翻译:声学事件分类(AEC)已被广泛应用于智能音箱和手机等设备中,用于家庭安全或辅助支持。随着越来越多的设备运行AEC模型,并且具有不同的计算资源限制,开发针对每个计算资源限制进行优化的模型变得越来越昂贵。在本文中,我们介绍了一种用于AEC的Once-For-All(OFA)神经结构搜索(NAS)框架。具体地,我们首先训练一个权值共享的超网,它支持不同的模型架构,然后自动搜索给定特定计算资源限制的模型。我们的实验结果表明,仅通过一次训练,NAS得到的结果显著优于从头开始单独训练模型和知识蒸馏(25.4%和7.3%相对改进)。我们还发现,极小模型的权值共享超网训练的好处不仅来自于搜索,而且还来自于优化。