In this work, we show that it is possible to train low-level control policies with reinforcement learning entirely in simulation and, then, deploy them on a quadrotor robot without using real-world data to fine-tune. To render zero-shot policy transfers feasible, we apply simulation optimization to narrow the reality gap. Our neural network-based policies use only onboard sensor data and run entirely on the embedded drone hardware. In extensive real-world experiments, we compare three different control structures ranging from low-level pulse-width-modulated motor commands to high-level attitude control based on nested proportional-integral-derivative controllers. Our experiments show that low-level controllers trained with reinforcement learning require a more accurate simulation than higher-level control policies.
翻译:在这项工作中,我们证明有可能培训低层次的控制政策,在完全模拟中进行强化学习,然后将其运用于一个二次模型机器人,而不使用真实世界数据进行微调。为了使零射政策转移成为可行,我们应用模拟优化来缩小现实差距。我们的神经网络政策只使用机载传感器数据,完全使用嵌入的无人机硬件。在广泛的现实世界实验中,我们比较了三种不同的控制结构,从低波脉冲调动指令到基于嵌入的成型成比例成形控制器的高层态度控制。我们的实验显示,受过强化学习训练的低级别的控制者需要比高层控制政策更精确的模拟。