Quadrotors are highly nonlinear dynamical systems that require carefully tuned controllers to be pushed to their physical limits. Recently, learning-based control policies have been proposed for quadrotors, as they would potentially allow learning direct mappings from high-dimensional raw sensory observations to actions. Due to sample inefficiency, training such learned controllers on the real platform is impractical or even impossible. Training in simulation is attractive but requires to transfer policies between domains, which demands trained policies to be robust to such domain gap. In this work, we make two contributions: (i) we perform the first benchmark comparison of existing learned control policies for agile quadrotor flight and show that training a control policy that commands body-rates and thrust results in more robust sim-to-real transfer compared to a policy that directly specifies individual rotor thrusts, (ii) we demonstrate for the first time that such a control policy trained via deep reinforcement learning can control a quadrotor in real-world experiments at speeds over 45km/h.
翻译:二次曲线是高度非线性动态系统, 需要仔细调整控制器, 才能将其推向物理极限。 最近, 提议了针对二次曲线的基于学习的控制政策, 因为这些政策有可能从高维原始感官观测中学习直接映射到行动。 由于抽样效率低下, 在真实平台上培训这类学习的控制器不切实际, 是不切实际的, 甚至不可能。 模拟培训具有吸引力, 但需要在不同领域之间转移政策, 要求经过训练的政策能够稳健到这样的域间差距 。 在这项工作中, 我们做出两项贡献 :( 一) 我们对现有的快速二次曲线飞行的学习控制政策进行第一次基准比较, 并显示与直接指定个人转子推力的政策相比, 培训控制政策可以更稳健的模拟到真实的传输 。 (二) 我们第一次证明, 通过深加力学习培训的这种控制政策能够以45公里/ 小时的速度控制真实世界实验中的二次轨道 。