Increasing traffic demands, higher levels of automation, and communication enhancements provide novel design opportunities for future air traffic controllers (ATCs). This article presents a novel deep reinforcement learning (DRL) controller to aid conflict resolution for autonomous free flight. Although DRL has achieved important advancements in this field, the existing works pay little attention to the explainability and safety issues related to DRL controllers, particularly the safety under adversarial attacks. To address those two issues, we design a fully explainable DRL framework wherein we: 1) decompose the coupled Q value learning model into a safety-awareness and efficiency (reach the target) one; and 2) use information from surrounding intruders as inputs, eliminating the needs of central controllers. In our simulated experiments, we show that by decoupling the safety-awareness and efficiency, we can exceed performance on free flight control tasks while dramatically improving explainability on practical. In addition, the safety Q learning module provides rich information about the safety situation of environments. To study the safety under adversarial attacks, we additionally propose an adversarial attack strategy that can impose both safety-oriented and efficiency-oriented attacks. The adversarial aims to minimize safety/efficiency by only attacking the agent at a few time steps. In the experiments, our attack strategy increases as many collisions as the uniform attack (i.e., attacking at every time step) by only attacking the agent four times less often, which provide insights into the capabilities and restrictions of the DRL in future ATC designs. The source code is publicly available at https://github.com/WLeiiiii/Gym-ATC-Attack-Project.
翻译:虽然DRL在这一领域取得了重要进展,但现有的作品很少注意与DRL控制员有关的解释和安全问题,特别是对抗性攻击的安全问题。为了解决这两个问题,我们设计了一个完全可以解释的DRL框架,其中我们:(1) 将交错的Qub学习模型分解成一个安全意识和效率(达到目标)1;(2) 使用来自周围的公开入侵者的信息作为投入,消除中央控制员的需求。在我们模拟实验中,我们通过分解安全意识和效率,现有作品很少注意与DRL控制员有关的解释和安全问题,特别是在对抗性攻击下的安全性能。此外,我们设计了一个完全可以解释的DRL框架,其中我们:1) 将交错的Qub学习模型分解成一个安全意识和效率(达到目标);2) 使用来自周围的公开入侵者的信息作为投入,消除中央控制员的需要。在我们模拟实验中,通过解析安全意识和效率限制,我们可以超过自由飞行控制任务的业绩,同时大大改进实用性。此外,安全性学习模块提供了环境状况的丰富信息。只有对对抗性攻击性攻击下的安全性攻击性攻击下的安全性攻击策略,我们攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性攻击性