Nowadays voice search for points of interest (POI) is becoming increasingly popular. However, speech recognition for local POI has remained to be a challenge due to multi-dialect and massive POI. This paper improves speech recognition accuracy for local POI from two aspects. Firstly, a geographic acoustic model (Geo-AM) is proposed. The Geo-AM deals with multi-dialect problem using dialect-specific input feature and dialect-specific top layer. Secondly, a group of geo-specific language models (Geo-LMs) are integrated into our speech recognition system to improve recognition accuracy of long tail and homophone POI. During decoding, specific language models are selected on demand according to users' geographic location. Experiments show that the proposed Geo-AM achieves 6.5%$\sim$10.1% relative character error rate (CER) reduction on an accent testset and the proposed Geo-AM and Geo-LM totally achieve over 18.7% relative CER reduction on Tencent Map task.
翻译:目前,对兴趣点的语音搜索越来越受欢迎。然而,对本地 POI的语音识别由于多角度和大规模 POI, 仍然是一项挑战。 本文从两个方面提高了当地 POI的语音识别准确性。 首先, 提出了地理声学模型( Geo- AM ) 。 Geo- AM 使用方言特定输入特征和方言特定顶层处理多方言问题。 第二, 将一组特定地理语言模型( Geo- LM ) 纳入我们的语音识别系统, 以提高长尾和同声POI 的识别准确性。 在解码过程中, 具体语言模型是根据用户的地理位置根据需求选择的。 实验显示, 拟议的Geo-AM 达到6.5%\ sim$10.1 % 相对性差率(CER), 降低口音测试仪和拟议中的Geo-AM 和Geo-LM 完全实现了Tenent地图任务的18.7% 的相对CER 。