Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://github.com/wenet-e2e/wekws.
翻译:KWS 关键字定位( KWS) 能够使语音用户互动,并逐渐成为智能设备不可或缺的组成部分。 最近, 端到端( E2E) 方法已成为最受欢迎的 KWS 任务。 然而, E2E KWS 方法的研究和部署之间仍然存在差距。 在本文中, 我们引入了WWKws, 这是一种高质量的生产、 容易建造和方便应用的 E2E KWS 工具包。 WeKws 包含一些最先进的主干网的实施, 使它在三种公开的数据集上取得了高度竞争性的成果。 要将WEKws 变成一个纯的 E2E 工具, 我们利用一个精细的最大限度集合损失来使模型自己学习关键词的结尾位置, 这极大地简化了培训管道,并使WEKws非常高效地应用于现实世界情景中。 该工具包在https://github.com/wenet-e2e/wekws上公开提供。