We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.
翻译:我们提出了双关注神经偏置架构,旨在增强唤醒词(WW)的识别并提高语音识别任务的推理时间延迟。该架构通过利用WW识别来选择其关注网络的哪个分支执行输入音频帧的运行时计算路径,从而启用动态切换。使用内部去标识化数据集,我们证明了所提出的双关注网络可以在仅增加$1\%$的参数数量的情况下,将WW音频帧的计算成本降低$90\%$。该架构相对于基线改进了WW F1分数$16\%$、改进了一般性稀缺词错误率$3\%$。