Landing a quadrotor on an inclined surface is a challenging maneuver. The final state of any inclined landing trajectory is not an equilibrium, which precludes the use of most conventional control methods. We propose a deep reinforcement learning approach to design an autonomous landing controller for inclined surfaces. Using the proximal policy optimization (PPO) algorithm with sparse rewards and a tailored curriculum learning approach, an inclined landing policy can be trained in simulation in less than 90 minutes on a standard laptop. The policy then directly runs on a real Crazyflie 2.1 quadrotor and successfully performs real inclined landings in a flying arena. A single policy evaluation takes approximately 2.5\,ms, which makes it suitable for a future embedded implementation on the quadrotor.
翻译:倾斜着陆轨迹的最终状态并不是一种平衡,它排除了大多数常规控制方法的使用。我们建议采用深度强化学习方法来设计倾斜表面的自动着陆控制器。使用最接近的政策优化算法(PPO),其奖赏微乎其微,课程学习方法也量身定做,一种倾斜着陆政策可以在不到90分钟的时间内在标准笔记本电脑上进行模拟培训。然后,该政策直接运行在真正的Crazyflie 2.1 Quadrotor上,并成功地在飞行场上进行真正的倾斜着陆。一个单一的政策评估需要大约2.5 mms,因此适合今后在 quadrtor上实施。