Till now, attention-based models have been used with great success in the keyword spotting problem domain. However, in light of recent advances in deep learning, the question arises whether self-attention is truly irreplaceable for recognizing speech keywords. We thus explore the usage of gated MLPs -- previously shown to be alternatives to transformers in vision tasks -- for the keyword spotting task. We verify our approach on the Google Speech Commands V2-35 dataset and show that it is possible to obtain performance comparable to the state of the art without any apparent usage of self-attention.
翻译:到目前为止,在关键词识别问题领域,基于关注的模型已被成功使用。然而,鉴于最近在深层学习方面取得的进步,问题在于自我关注是否真正无法替代语音关键词识别。因此,我们探讨使用Gated MLPs(以前曾显示是愿景任务变压器的替代软件)来完成关键字识别任务。我们核实了我们在谷歌语音指令V2-35数据集上的做法,并表明在没有明显使用自我关注的情况下,有可能取得与最新技术相当的性能。