Automatic Speech Recognition (ASR) can be used as the assistance of speech communication between pilots and air-traffic controllers. Its application can significantly reduce the complexity of the task and increase the reliability of transmitted information. Evidently, high accuracy predictions are needed to minimize the risk of errors. Especially, high accuracy is required in recognition of key information, such as commands and callsigns, used to navigate pilots. Our results prove that the surveillance data containing callsigns can help to considerably improve the recognition of a callsign in an utterance when the weights of probable callsign n-grams are reduced per utterance. In this paper, we investigate two approaches: (1) G-boosting, when callsigns weights are adjusted at language model level (G) and followed by the dynamic decoder with an on-the-fly composition, and (2) lattice rescoring when callsign information is introduced on top of lattices generated using a conventional decoder. Boosting callsign n-grams with the combination of two methods allowed us to gain 28.4% of absolute improvement in callsign recognition accuracy and up to 74.2% of relative improvement in WER of callsign recognition.
翻译:自动语音识别( ASR) 可用于协助飞行员和空中交通控制器之间的语音通信。 它的应用可以大大降低任务的复杂性并增加传送信息的可靠性。 显然, 需要高精确的预测来尽量减少错误的风险。 特别是, 识别关键信息需要高精确度, 如用于导航飞行员的指令和呼号等关键信息。 我们的结果表明, 含有呼号的监视数据可以帮助在每次发言时降低可能的呼号 n 克重量时大大改进对发号信号的识别。 在本文中, 我们调查两种方法:(1) 调用G-, 当调用信号重量在语言模型级别( G) 上调整时, 并随后由动态的解码器以实时构成, 和 (2) 调用信息在使用常规解码器生成的拉特特信息顶端引入时, lattice 重新校正。 调用两种方法组合的调用调用调号 n 使得我们在调用信号识别精确度方面获得28.4%的绝对改进, 升至74.2% 相对识别。