Multi-rotor UAVs suffer from a restricted range and flight duration due to limited battery capacity. Autonomous landing on a 2D moving platform offers the possibility to replenish batteries and offload data, thus increasing the utility of the vehicle. Classical approaches rely on accurate, complex and difficult-to-derive models of the vehicle and the environment. Reinforcement learning (RL) provides an attractive alternative due to its ability to learn a suitable control policy exclusively from data during a training procedure. However, current methods require several hours to train, have limited success rates and depend on hyperparameters that need to be tuned by trial-and-error. We address all these issues in this work. First, we decompose the landing procedure into a sequence of simpler, but similar learning tasks. This is enabled by applying two instances of the same RL based controller trained for 1D motion for controlling the multi-rotor's movement in both the longitudinal and the lateral direction. Second, we introduce a powerful state space discretization technique that is based on i) kinematic modeling of the moving platform to derive information about the state space topology and ii) structuring the training as a sequential curriculum using transfer learning. Third, we leverage the kinematics model of the moving platform to also derive interpretable hyperparameters for the training process that ensure sufficient maneuverability of the multi-rotor vehicle. The training is performed using the tabular RL method Double Q-Learning. Through extensive simulations we show that the presented method significantly increases the rate of successful landings, while requiring less training time compared to other deep RL approaches. Finally, we deploy and demonstrate our algorithm on real hardware. For all evaluation scenarios we provide statistics on the agent's performance.
翻译:多机器人无人驾驶飞行器由于电池容量有限,其射程和飞行时间有限。自动降落在二维移动平台上,有可能补充电池和卸载数据,从而提高车辆的效用。经典方法依赖于准确、复杂和难于处理的车辆和环境模型。强化学习(RL)提供了一种有吸引力的替代方法,因为它有能力完全从培训过程中的数据中学习适当的控制政策。然而,目前的方法需要数小时培训,成功率有限,并取决于需要由试机调整的超强参数。我们在这项工作中处理所有这些问题。首先,我们将着陆程序解压缩成一个更简单、复杂和难于处理的车辆和环境的序列。强化学习学习(RL)提供了一种有吸引力的替代方法,因为它能够完全从远程和后向方向学习一个合适的控制政策。第二,我们引入了一种强大的国家飞行器离散化技术,其基础是(i)在移动平台上进行直线评估,以获取关于快速部署的模型信息,同时我们使用最新空间地形和连续的系统分析(L)显示我们进行的所有系统转换过程。</s>