Recently, neural control policies have outperformed existing model-based planning-and-control methods for autonomously navigating quadrotors through cluttered environments in minimum time. However, they are not perception aware, a crucial requirement in vision-based navigation due to the camera's limited field of view and the underactuated nature of a quadrotor. We propose a method to learn neural network policies that achieve perception-aware, minimum-time flight in cluttered environments. Our method combines imitation learning and reinforcement learning (RL) by leveraging a privileged learning-by-cheating framework. Using RL, we first train a perception-aware teacher policy with full-state information to fly in minimum time through cluttered environments. Then, we use imitation learning to distill its knowledge into a vision-based student policy that only perceives the environment via a camera. Our approach tightly couples perception and control, showing a significant advantage in computation speed (10x faster) and success rate. We demonstrate the closed-loop control performance using a physical quadrotor and hardware-in-the-loop simulation at speeds up to 50km/h.
翻译:最近,神经控制政策在最短的时间内通过杂乱的环境,优于现有的基于模型的规划和控制方法,在自主导航二次钻探器时,其效果超过了现有的基于模型的规划和控制方法,然而,由于摄像头的视野有限,而且一个二次钻探器的触动性质不足,它们并没有意识到这是基于视觉的导航中的一个关键要求。我们提出了一个方法来学习神经网络政策,以便在杂乱的环境中实现感知和最小时间飞行。我们的方法将模仿学习和强化学习(RL)结合起来,方法是利用一个优异的跨切学习框架。我们利用RL,首先培训一个带有全状态信息的认知教师政策,以便通过封闭的环境在最短的时间内飞行。然后,我们利用模仿学习将其知识注入基于视觉的学生政策,而这种政策只能通过摄像头来感知环境。我们的方法是紧紧的夫妇的认知和控制,在计算速度(10x更快)和成功率方面显示出很大的优势。我们用物理二次钻探器和硬盘模拟速度到50米展示了封闭式控制功能。