Attitude control of fixed-wing unmanned aerial vehicles (UAVs) is a difficult control problem in part due to uncertain nonlinear dynamics, actuator constraints, and coupled longitudinal and lateral motions. Current state-of-the-art autopilots are based on linear control and are thus limited in their effectiveness and performance. Deep reinforcement learning (DRL) is a machine learning method to automatically discover optimal control laws through interaction with the controlled system, which can handle complex nonlinear dynamics. We show in this paper that DRL can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as three minutes of flight data. We initially train our model in a simulation environment and then deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlane proportional-integral-derivative (PID) attitude controller with no further online learning required. Learning with significant actuation delay and diversified simulated dynamics were found to be crucial for successful transfer to control of the real UAV. In addition to a qualitative comparison with the ArduPlane autopilot, we present a quantitative assessment based on linear analysis to better understand the learning controller's behavior.
翻译:Translated Abstract:
由于具有不确定的非线性动态、执行器约束和长、横向耦合运动,固定翼无人机的姿态控制是一个困难的控制问题。目前,基于线性控制的最先进自动驾驶仍然受限于其有效性和性能。深度强化学习(DRL)是一种通过与受控系统的交互自动发现最优控制规律的机器学习方法,可以处理复杂的非线性动态问题。本文展示了DRL能够成功地学习使用仅仅三分钟的飞行数据来执行固定翼无人机的姿态控制。我们首先在仿真环境中训练模型,然后将学习到的控制器部署到飞行测试的无人机上,在不需要进一步的在线学习的情况下表现出与最先进的ArduPlane比例积分导数(PID)姿态控制器相当的性能。发现在大幅度作动延迟和多样化的模拟动力学下进行学习对成功的变换到实际无人机控制至关重要。除了与ArduPlane自动驾驶系统的定性比较外,我们还基于线性分析进行了量化评估,以更好地了解学习控制器的行为。