利用深层强化学习进行Inlined Quadrotor着陆 (Inclined Quadrotor Landing using Deep Reinforcement Learning)

Landing a quadrotor on an inclined surface is a challenging manoeuvre. The final state of any inclined landing trajectory is not an equilibrium, which precludes the use of most conventional control methods. We propose a deep reinforcement learning approach to design an autonomous landing controller for inclined surfaces. Using the proximal policy optimization (PPO) algorithm with sparse rewards and a tailored curriculum learning approach, a robust policy can be trained in simulation in less than 90 minutes on a standard laptop. The policy then directly runs on a real Crazyflie 2.1 quadrotor and successfully performs real inclined landings in a flying arena. A single policy evaluation takes approximately 2.5 ms, which makes it suitable for a future embedded implementation on the quadrotor.

翻译：倾斜着陆轨迹的最终状态并不是一种平衡,它排除了大多数常规控制方法的使用。我们提出一种深度强化学习方法,用于设计倾斜表面的自动着陆控制器。使用最接近的政策优化算法,且奖励微薄,并采用量身定做的课程学习方法,可以在不到90分钟的时间内用标准笔记本电脑进行模拟培训。然后,该政策直接运行在真正的疯狂flie 2.1 二次倾斜轨道上,并成功地在飞行场上进行真正的倾斜着陆。一个单一的政策评估需要大约2.5米,因此适合今后在二次曲线上嵌入执行。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日