Conversational agents commonly utilize keyword spotting (KWS) to initiate voice interaction with the user. For user experience and privacy considerations, existing approaches to KWS largely focus on accuracy, which can often come at the expense of introduced latency. To address this tradeoff, we propose a novel approach to control KWS model latency and which generalizes to any loss function without explicit knowledge of the keyword endpoint. Through a single, tunable hyperparameter, our approach enables one to balance detection latency and accuracy for the targeted application. Empirically, we show that our approach gives superior performance under latency constraints when compared to existing methods. Namely, we make a substantial 25\% relative false accepts improvement for a fixed latency target when compared to the baseline state-of-the-art. We also show that when our approach is used in conjunction with a max-pooling loss, we are able to improve relative false accepts by 25 % at a fixed latency when compared to cross entropy loss.
翻译:连接剂通常使用关键词定位( KWS) 来启动与用户的语音互动。 对于用户的体验和隐私考虑, KWS的现有方法主要侧重于准确性, 而准确性往往以引入延迟性为代价。 为解决这一权衡,我们提出了一种新的方法来控制 KWS 模型延时性, 并概括到任何损失功能而没有明确了解关键词端点。 我们的方法通过一个单一的、 金枪鱼可捕量的超参数, 使得我们能够平衡目标应用程序的探测延时性和准确性。 典型地, 我们显示, 与现有方法相比, 我们的方法在延时性限制下表现优异。 也就是说, 我们做了一个实质性的 25- 相对错误的测试, 接受固定的延时性目标, 相对于基准状态的状态。 我们还表明, 当我们的方法在与最大耗资损失同时使用时, 我们能够改进相对受25 % 的误值, 相对于交叉延时, 以固定的延时, 25 % 。