机场空管通信领域的经验教训：5000小时的鲁棒自动语音识别和理解 (Lessons Learned in ATCO2: 5000 hours of Air Traffic Control Communications for Robust Automatic Speech Recognition and Understanding)

Juan Zuluaga-Gomez,Iuliia Nigmatulina,Amrutha Prasad,Petr Motlicek,Driss Khalil,Srikanth Madikeri,Allan Tart,Igor Szoke,Vincent Lenders,Mickael Rigault,Khalid Choukri

from arxiv, Manuscript under review

Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). This task requires high levels of awareness from ATCos and can be tedious and error-prone. Recent attempts have been made to integrate artificial intelligence (AI) into ATC in order to reduce the workload of ATCos. However, the development of data-driven AI systems for ATC demands large-scale annotated datasets, which are currently lacking in the field. This paper explores the lessons learned from the ATCO2 project, a project that aimed to develop a unique platform to collect and preprocess large amounts of ATC data from airspace in real time. Audio and surveillance data were collected from publicly accessible radio frequency channels with VHF receivers owned by a community of volunteers and later uploaded to Opensky Network servers, which can be considered an "unlimited source" of data. In addition, this paper reviews previous work from ATCO2 partners, including (i) robust automatic speech recognition, (ii) natural language processing, (iii) English language identification of ATC communications, and (iv) the integration of surveillance data such as ADS-B. We believe that the pipeline developed during the ATCO2 project, along with the open-sourcing of its data, will encourage research in the ATC field. A sample of the ATCO2 corpus is available on the following website: https://www.atco2.org/data, while the full corpus can be purchased through ELDA at http://catalog.elra.info/en-us/repository/browse/ELRA-S0484. We demonstrated that ATCO2 is an appropriate dataset to develop ASR engines when little or near to no ATC in-domain data is available. For instance, with the CNN-TDNNf kaldi model, we reached the performance of as low as 17.9% and 24.9% WER on public ATC datasets which is 6.6/7.6% better than "out-of-domain" but supervised CNN-TDNNf model.

翻译：飞行管制员（ATCos）和飞行员之间的语音通信对确保安全和高效的空中交通管制（ATC）至关重要。这项任务要求ATCos具有高度的意识，但也可能会很繁琐且容易出错。最近，试图将人工智能（AI）融入ATC以减轻ATCos的工作负担。然而，为ATC开发数据驱动的AI系统需要大规模注释的数据集，而该领域目前缺乏这样的数据集。本文探讨了ATCO2项目的经验教训，该项目旨在开发一个独特的平台，实时收集和预处理大量的ATC数据。音频和监视数据是从社区志愿者拥有的VHF接收器所连接的公开可访问无线电频率通道收集而来，随后上传到Opensky Network服务器，可以视为“无限资源”数据。此外，本文还回顾了ATCO2合作伙伴以前的工作，包括（i）鲁棒的自动语音识别，（ii）自然语言处理，（iii）ATC通信语音的英语语言识别，以及（iv）集成ADS-B等监视数据。我们认为ATCO2项目开发的流程，以及其数据的开放源代码，将鼓励ATC领域的研究。 ATCO2语料库的样本可在以下网站上获取：https://www.atco2.org/data，而完整的语料库可以通过ELDA购买http://catalog.elra.info/en-us/repository/browse/ELRA-S0484。我们证明了当缺少或接近无ATC in-domain数据可用时，ATCO2是开发ASR引擎的合适数据集。例如，在CNN-TDNNf kaldi模型方面，我们达到了公开ATC数据集WRR低至17.9％和24.9％的性能，这比“域外”的监督CNN-TDNNf模型更好6.6 / 7.6％。