Recently, learning-based controllers have been shown to push mobile robotic systems to their limits and provide the robustness needed for many real-world applications. However, only classical optimization-based control frameworks offer the inherent flexibility to be dynamically adjusted during execution by, for example, setting target speeds or actuator limits. We present a framework to overcome this shortcoming of neural controllers by conditioning them on an auxiliary input. This advance is enabled by including a feature-wise linear modulation layer (FiLM). We use model-free reinforcement-learning to train quadrotor control policies for the task of navigating through a sequence of waypoints in minimum time. By conditioning the policy on the maximum available thrust or the viewing direction relative to the next waypoint, a user can regulate the aggressiveness of the quadrotor's flight during deployment. We demonstrate in simulation and in real-world experiments that a single control policy can achieve close to time-optimal flight performance across the entire performance envelope of the robot, reaching up to 60 km/h and 4.5g in acceleration. The ability to guide a learned controller during task execution has implications beyond agile quadrotor flight, as conditioning the control policy on human intent helps safely bringing learning based systems out of the well-defined laboratory environment into the wild.
翻译:最近,以学习为基础的控制器被证明能够将移动机器人系统推向极限,并为许多现实世界应用提供所需的稳健性。然而,只有经典的优化控制框架才能提供内在的灵活性,以便在执行过程中通过设定目标速度或动画器限制等设定目标速度或动画器限制来动态调整。我们提出了一个框架,通过对神经控制器的辅助输入进行修饰来克服神经控制器的缺陷。这一进步是通过纳入一个具有特效的线性线性调控层(FILM)来实现的。我们使用不设模型的强化学习来培训在最短的时间内通过一系列通道导航的任务的二次轨迹控制政策。通过调整政策在与下一个路径相对的现有最大方向或查看方向上进行动态调整,用户可以调节在部署期间对二次曲线控制器飞行的侵略性。我们在模拟和现实世界实验中表明,单项控制政策可以在机器人整个性能范围内接近最理想时间的飞行性能,达到60公里/小时,加速度为4.5克。在任务执行过程中指导一个已学过的控制器的能力使任务执行过程中的精通性控制器将影响超出稳定的实验室控制系统。