Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries. However, spoken entity recognition is a difficult problem, due to the large number of frequently-changing named entities. In addition, resources available for recognition are constrained when ASR is performed on-device. In this work, we investigate the use of probabilistic grammars as language models within the finite-state transducer (FST) framework. We introduce a deterministic approximation to probabilistic grammars that avoids the explicit expansion of non-terminals at model creation time, integrates directly with the FST framework, and is complementary to n-gram models. We obtain a 10% relative word error rate improvement on long tail entity queries compared to when a similarly-sized n-gram model is used without our method.
翻译:虚拟助理利用自动语音识别(ASR)帮助用户回答以实体为中心的问题。然而,由于经常变化的命名实体数量众多,口语实体的识别是一个困难问题。此外,在进行自动识别时,可用于识别的资源受到限制。在这项工作中,我们调查了在有限国家转换器(FST)框架内使用概率语法作为语言模型的情况。我们对概率语法引入了一种确定性近似,避免在模型创建时明确扩大非终点语法,直接与FST框架相结合,是对正克模式的补充。我们在长尾实体的查询中获得了10%相对字差率的改进,而在没有我们方法的情况下使用类似规模的正克模式时,我们得到了10%的相对字差率改进。