Reinforcement learning (RL) is a promising approach. However, success is limited to real-world applications, because ensuring safe exploration and facilitating adequate exploitation is a challenge for controlling robotic systems with unknown models and measurement uncertainties. The learning problem becomes even more difficult for complex tasks over continuous state-action. In this paper, we propose a learning-based robotic control framework consisting of several aspects: (1) we leverage Linear Temporal Logic (LTL) to express complex tasks over infinite horizons that are translated to a novel automaton structure; (2) we detail an innovative reward scheme for LTL satisfaction with a probabilistic guarantee. Then, by applying a reward shaping technique, we develop a modular policy-gradient architecture exploiting the benefits of the automaton structure to decompose overall tasks and enhance the performance of learned controllers; (3) by incorporating Gaussian Processes (GPs) to estimate the uncertain dynamic systems, we synthesize a model-based safe exploration during the learning process using Exponential Control Barrier Functions (ECBFs) that generalize systems with high-order relative degrees; (4) to further improve the efficiency of exploration, we utilize the properties of LTL automata and ECBFs to propose a safe guiding process. Finally, we demonstrate the effectiveness of the framework via several robotic environments. We show an ECBF-based modular deep RL algorithm that achieves near-perfect success rates and safety guarding with high probability confidence during training.
翻译:然而,成功仅限于现实世界应用,因为确保安全探索和促进充分开发是控制具有未知模型和测量不确定性的机器人系统的挑战。学习问题对于持续国家行动的复杂性任务来说变得更加困难。在本文件中,我们提出一个基于学习的机器人控制框架,由几个方面组成:(1) 我们利用线性时空逻辑(LTL)来表达在无边地平线上完成的复杂任务,这些任务被转化成一个新的自动地标结构;(2) 我们详细列出一个创新的LTL满意度奖励计划,以具有概率保证。 然后,通过采用奖励制成技术,我们开发一个模块式的政策制定型架构,利用自动成形结构的好处,拆分解总体任务,提高已学控制者的业绩;(3) 通过纳入Gausian进程(GPs)来估计不确定的动态系统,我们综合了学习过程中基于模型的安全探索,使系统普遍化,具有高度的相对等级;(4) 进一步提高探索效率,我们利用高额的RLVAF系统安全度,我们用一个安全性环境展示了EBLA的高级安全性。