We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks. This architecture enables a dynamic switch for its runtime compute paths by exploiting WW spotting to select which branch of its attention networks to execute for an input audio frame. With this approach, we effectively improve WW spotting accuracy while saving runtime compute cost as defined by floating point operations (FLOPs). Using an in-house de-identified dataset, we demonstrate that the proposed dual-attention network can reduce the compute cost by $90\%$ for WW audio frames, with only $1\%$ increase in the number of parameters. This architecture improves WW F1 score by $16\%$ relative and improves generic rare word error rate by $3\%$ relative compared to the baselines.
翻译:本文提出了双重注意力神经偏置方法,一种为提高语音识别任务中唤醒词(WW)检测精度和改进推理时间延迟而设计的架构。该架构通过利用WW检测来选择其注意力网络的执行路径,从而实现运行时计算路径的动态切换。采用这种方法,可以在节省运行时浮点运算操作(FLOPs)的同时有效提高WW检测精度。使用本实验室的匿名数据集,我们证明了所提出的双重注意力网络可以在保持参数数量增加仅为1%的情况下,将WW音频帧的计算成本降低90%。该架构将WW F1分数相对于基线模型提高了16%,将常见罕见词错误率相对于基线模型提高了3%。