利用深强化学习进行自主控制 (Autonomous Blimp Control using Deep Reinforcement Learning)

Aerial robot solutions are becoming ubiquitous for an increasing number of tasks. Among the various types of aerial robots, blimps are very well suited to perform long-duration tasks while being energy efficient, relatively silent and safe. To address the blimp navigation and control task, in our recent work, we have developed a software-in-the-loop simulation and a PID-based controller for large blimps in the presence of wind disturbance. However, blimps have a deformable structure and their dynamics are inherently non-linear and time-delayed, often resulting in large trajectory tracking errors. Moreover, the buoyancy of a blimp is constantly changing due to changes in the ambient temperature and pressure. In the present paper, we explore a deep reinforcement learning (DRL) approach to address these issues. We train only in simulation, while keeping conditions as close as possible to the real-world scenario. We derive a compact state representation to reduce the training time and a discrete action space to enforce control smoothness. Our initial results in simulation show a significant potential of DRL in solving the blimp control task and robustness against moderate wind and parameter uncertainty. Extensive experiments are presented to study the robustness of our approach. We also openly provide the source code of our approach.

翻译：对于越来越多的任务,空中机器人的解决方案正在变得无处不在。在各种类型的空中机器人中,亮度非常适合执行长期任务,同时具有能源效率、相对沉默和安全性。为了应对大桥导航和控制任务,我们在最近的工作中开发了“环状”模拟软件和基于 PID 的控制器,以便在风扰动时用于大型浮标。然而,微度结构有一个可变形的结构,其动态在本质上是非线性和时间延缓的,常常导致巨大的轨迹跟踪错误。此外,由于环境温度和压力的变化,大浪的浮力正在不断发生变化。在本文件中,我们探索了解决这些问题的深度强化学习(DRL)方法。我们只是进行模拟培训,同时尽可能保持与现实世界情景相近的条件。我们得到了压缩的州代表,以减少培训时间,并且有执行控制平稳的离散行动空间。我们在模拟中的初步结果显示DRL在解决环境温度和压力变化方面的巨大潜力。我们提出的“DRL”的浮度和稳健度方法也提供了我们对强度的风序和强度的参数的实验。