利用类似情况进行PAC-MDP探索 (Provably Safe PAC-MDP Exploration Using Analogies)

A key challenge in applying reinforcement learning to safety-critical domains is understanding how to balance exploration (needed to attain good performance on the task) with safety (needed to avoid catastrophic failure). Although a growing line of work in reinforcement learning has investigated this area of "safe exploration," most existing techniques either 1) do not guarantee safety during the actual exploration process; and/or 2) limit the problem to a priori known and/or deterministic transition dynamics with strong smoothness assumptions. Addressing this gap, we propose Analogous Safe-state Exploration (ASE), an algorithm for provably safe exploration in MDPs with unknown, stochastic dynamics. Our method exploits analogies between state-action pairs to safely learn a near-optimal policy in a PAC-MDP sense. Additionally, ASE also guides exploration towards the most task-relevant states, which empirically results in significant improvements in terms of sample efficiency, when compared to existing methods.

翻译：将强化学习应用到安全关键领域的一个关键挑战是了解如何在勘探(为在任务上取得良好业绩而需要的)和安全(避免灾难性失败)之间取得平衡。虽然在强化学习方面越来越多的工作已经调查了“安全勘探”领域,但大多数现有技术都1个或1个不能保证实际勘探过程中的安全;和(或)2 将这一问题限制在事先已知的和(或)决定性的过渡动态上,并具有很强的平稳假设。解决这一差距,我们提出了“模拟安全状态勘探”(ASE),这是在具有未知、随机动态的MDP进行可察觉的安全探索的一种算法。我们的方法利用州际行动对口之间的类似方法,安全地学习PAC-MDP意义上的近最佳政策。此外,ASE还指导探索走向最与任务相关的状态,与现有方法相比,在抽样效率方面,从经验上取得显著改善。

相关内容

ASE

关注 0

第34届IEEE/ACM自动化软件工程国际会议（ASE 2019）将于2019年11月11日至15日在圣地亚哥举行。该会议是自动化软件工程的首要研究论坛。每年，它汇集了学术界和工业界的研究人员和实践者，讨论自动化、分析、设计、实现、测试和维护大型软件系统的基础、技术和工具。官网链接：https://2019.ase-conferences.org/

【最受欢迎的概率书】《概率论：理论与实例》，490页pdf

专知会员服务

170+阅读 · 2020年11月13日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日