克服探索:从时空逻辑规格中为在被截断环境中持续控制进行持续控制而深强化学习 (Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments from Temporal Logic Specifications)

Model-free continuous control for robot navigation tasks using Deep Reinforcement Learning (DRL) that relies on noisy policies for exploration is sensitive to the density of rewards. In practice, robots are usually deployed in cluttered environments, containing many obstacles and narrow passageways. Designing dense effective rewards is challenging, resulting in exploration issues during training. Such a problem becomes even more serious when tasks are described using temporal logic specifications. This work presents a deep policy gradient algorithm for controlling a robot with unknown dynamics operating in a cluttered environment when the task is specified as a Linear Temporal Logic (LTL) formula. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme by integrating sampling-based methods to effectively complete goal-reaching missions. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-goal-reaching tasks that are solved in a distributed manner. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale cluttered environments. A video demonstration can be found on YouTube Channel: https://youtu.be/yMh_NUNWxho.

翻译：使用深强化学习系统(DRL)对机器人导航任务进行无模型连续控制,这种控制依赖于噪音的勘探政策,对奖励的密度十分敏感。实际上,机器人通常部署在杂乱的环境中,包含许多障碍和狭窄的通道。设计密集的有效奖赏具有挑战性,在培训期间导致勘探问题。当任务使用时间逻辑规格来描述时,这一问题就变得更加严重。当任务被指定为线性时,在杂乱的环境中运行的动态不明的机器人,其控制政策梯度算法就是一个深度的政策梯度算法。为了在培训期间克服对环境的探索挑战,我们建议采用新的路径规划指导奖励办法,将基于取样的方法结合起来,以有效完成具有目标性的任务。为了便利LTL的满意度,我们的方法将LTL任务转化为以分布方式解决的次级目标性任务。我们的框架表明,将大大改进在大型封闭环境中执行复杂任务的机器人的性能(效率、效率和探索)。在YouTube频道上可找到一个视频演示:https://youx_Mybe。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【硬核书】深度强化学习实践手册：应用现代RL方法，包括深度Q网络、值迭代、策略梯度、TRPO、AlphaGo等，547页pdf

专知会员服务

79+阅读 · 2022年12月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日