Integration of reinforcement learning with unmanned aerial vehicles (UAVs) to achieve autonomous flight has been an active research area in recent years. An important part focuses on obstacle detection and avoidance for UAVs navigating through an environment. Exploration in an unseen environment can be tackled with Deep Q-Network (DQN). However, value exploration with uniform sampling of actions may lead to redundant states, where often the environments inherently bear sparse rewards. To resolve this, we present two techniques for improving exploration for UAV obstacle avoidance. The first is a convergence-based approach that uses convergence error to iterate through unexplored actions and temporal threshold to balance exploration and exploitation. The second is a guidance-based approach using a Domain Network which uses a Gaussian mixture distribution to compare previously seen states to a predicted next state in order to select the next action. Performance and evaluation of these approaches were implemented in multiple 3-D simulation environments, with variation in complexity. The proposed approach demonstrates a two-fold improvement in average rewards compared to state of the art.
翻译:近些年来,与无人驾驶飞行器(无人驾驶飞行器)整合强化学习以实现自主飞行是一个积极的研究领域,其重要部分侧重于对无人驾驶飞行器在环境中航行的障碍探测和避免障碍。可以在深Q网络(DQN)中解决在隐蔽环境中的探索问题。然而,以统一的行动抽样进行价值探索可能导致冗余状态,而环境本身往往产生稀少的回报。为了解决这一问题,我们提出了两种改进无人驾驶飞行器障碍避免探索的技术。第一,基于趋同的方法,通过未探索的行动和时间阈值利用合并错误进行循环,以平衡勘探和开发。第二,采用基于指导的方法,利用高斯混合分布法将以往看到的状态与预测的下一个状态进行比较,以便选择下一个行动。这些方法的绩效和评价是在多个三维模拟环境中实施的,复杂程度各异。拟议的方法表明,与艺术现状相比,平均回报有两重改进。