Performing autonomous exploration is essential for unmanned aerial vehicles (UAVs) operating in unknown environments. Often, these missions start with building a map for the environment via pure exploration and subsequently using (i.e. exploiting) the generated map for downstream navigation tasks. Accomplishing these navigation tasks in two separate steps is not always possible or even disadvantageous for UAVs deployed in outdoor and dynamically changing environments. Current exploration approaches either use a priori human-generated maps or use heuristics such as frontier-based exploration. Other approaches use learning but focus only on learning policies for specific tasks by either using sample inefficient random exploration or by making impractical assumptions about full map availability. In this paper, we develop an adaptive exploration approach to trade off between exploration and exploitation in one single step for UAVs searching for areas of interest (AoIs) in unknown environments using Deep Reinforcement Learning (DRL). The proposed approach uses a map segmentation technique to decompose the environment map into smaller, tractable maps. Then, a simple information gain function is repeatedly computed to determine the best target region to search during each iteration of the process. DDQN and A2C algorithms are extended with a stack of LSTM layers and trained to generate optimal policies for the exploration and exploitation, respectively. We tested our approach in 3 different tasks against 4 baselines. The results demonstrate that our proposed approach is capable of navigating through randomly generated environments and covering more AoI in less time steps compared to the baselines.
翻译:进行自主探索对于在未知环境中运行的无人驾驶飞行器(无人驾驶飞行器)至关重要。通常,这些飞行任务首先通过纯粹的勘探,然后(即利用)为下游导航任务绘制地图,然后(即利用)为下游导航任务绘制一份环境地图。在户外和动态变化环境中部署的无人驾驶飞行器,完成这些导航任务并非总有可能或甚至不利。目前的勘探方法要么使用先天人类生成的地图,或者使用边界勘探等惯用地图。其他方法则使用学习,但仅侧重于具体任务的学习政策,方法是利用低效率抽样随机勘探,或者不切实际地假设全部地图的可用性。在本文件中,我们制定了一种适应性探索方法,在探索和开发之间进行交易,单一步地在未知的环境中寻找感兴趣的区域(Aois),使用深层强化学习(DRLL) 。拟议方法使用地图分层技术将环境地图分解成较小的、可移植的地图。然后,反复计算简单的信息增益功能,以确定在每次进程每次采集期间搜索的最佳目标区域,比较地图的进度,将DDQ-N和L2算算算方法分别在我们经过测试后,在4层和LA级上,通过测试后,将进行最佳勘探和LA-xx-xxxxx的进度,将产生一个测试后,用最深层和L-xx的进度表的进度表的进度表,以分别进行更深层和L-xxxxxxx。