In this paper we propose a novel virtual simulation-pilot engine for speeding up air traffic controller (ATCo) training by integrating different state-of-the-art artificial intelligence (AI) based tools. The virtual simulation-pilot engine receives spoken communications from ATCo trainees, and it performs automatic speech recognition and understanding. Thus, it goes beyond only transcribing the communication and can also understand its meaning. The output is subsequently sent to a response generator system, which resembles the spoken read back that pilots give to the ATCo trainees. The overall pipeline is composed of the following submodules: (i) automatic speech recognition (ASR) system that transforms audio into a sequence of words; (ii) high-level air traffic control (ATC) related entity parser that understands the transcribed voice communication; and (iii) a text-to-speech submodule that generates a spoken utterance that resembles a pilot based on the situation of the dialogue. Our system employs state-of-the-art AI-based tools such as Wav2Vec 2.0, Conformer, BERT and Tacotron models. To the best of our knowledge, this is the first work fully based on open-source ATC resources and AI tools. In addition, we have developed a robust and modular system with optional submodules that can enhance the system's performance by incorporating real-time surveillance data, metadata related to exercises (such as sectors or runways), or even introducing a deliberate read-back error to train ATCo trainees to identify them. Our ASR system can reach as low as 5.5% and 15.9% word error rates (WER) on high and low-quality ATC audio. We also demonstrate that adding surveillance data into the ASR can yield callsign detection accuracy of more than 96%.
翻译:在本文中,我们提出了一种新颖的虚拟模拟-飞行员引擎,通过集成不同的最先进的人工智能(AI)工具,加速空中交通管制员(ATCo)的培训。虚拟模拟-飞行员引擎接收来自ATCo学员的口语通信,并执行自动语音识别和理解。因此,它不仅仅是将通信转录下来,还可以理解其含义。输出随后被发送到响应生成系统,其类似于飞行员向ATCo学员读回的口头读取。整个流水线由以下子模块组成:(i)自动语音识别(ASR)系统,将音频转换为单词序列;(ii)高级空中交通管制(ATC)相关实体解析器,理解转录的语音通信;以及(iii)文本到语音的子模块,根据对话情况生成类似于飞行员的口头语。我们的系统采用最先进的基于AI的工具,例如Wav2Vec 2.0,Conformer,BERT和Tacotron模型。据我们所知,这是第一个完全基于开源ATC资源和AI工具的工作。此外,我们开发了一个健壮而模块化的系统,可通过引入实时监控数据、与练习相关的元数据(例如区域或跑道)或甚至引入有意的读回错误来增强系统的性能,以便培训ATCo学员识别它们。我们的ASR系统可以在高质量和低质量的ATC音频上达到最低的5.5%和15.9%的字错误率(WER)。我们还证明,将监视数据添加到ASR中可以获得超过96%的呼号检测准确率。