Landing a quadrotor on an inclined surface is a challenging manoeuvre. The final state of any inclined landing trajectory is not an equilibrium, which precludes the use of most conventional control methods. We propose a deep reinforcement learning approach to design an autonomous landing controller for inclined surfaces. Using the proximal policy optimization (PPO) algorithm with sparse rewards and a tailored curriculum learning approach, a robust policy can be trained in simulation in less than 90 minutes on a standard laptop. The policy then directly runs on a real Crazyflie 2.1 quadrotor and successfully performs real inclined landings in a flying arena. A single policy evaluation takes approximately 2.5 ms, which makes it suitable for a future embedded implementation on the quadrotor.
翻译:倾斜着陆轨迹的最终状态并不是一种平衡,它排除了大多数常规控制方法的使用。我们提出一种深度强化学习方法,用于设计倾斜表面的自动着陆控制器。使用最接近的政策优化算法,且奖励微薄,并采用量身定做的课程学习方法,可以在不到90分钟的时间内用标准笔记本电脑进行模拟培训。然后,该政策直接运行在真正的疯狂flie 2.1 二次倾斜轨道上,并成功地在飞行场上进行真正的倾斜着陆。一个单一的政策评估需要大约2.5米,因此适合今后在二次曲线上嵌入执行。