通过Markov决定程序变异性抽象化(技术报告),以正式保证的形式,将RL政策与正式保证一起提炼(技术报告) (Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)) - 专知论文

会员服务 ·

0

Markov · Learning · 离散化 · 蒸馏 · 回合 ·

2022 年 6 月 14 日

Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

翻译：通过Markov决定程序变异性抽象化(技术报告),以正式保证的形式,将RL政策与正式保证一起提炼(技术报告)

Florent Delgrange,Ann Nowé,Guillermo A. Pérez

from arxiv, AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix)

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.

翻译：我们认为,政策简化和核查的挑战是在连续环境中通过强化学习(RL)所学习的政策背景下考虑的。在行为良好的环境中,RL算法在限度内有趋同的保证,虽然这些保证是有价值的,但对于安全关键应用来说是不够的。此外,在应用深RL等先进技术时,它们就失去了这种保证。在将先进的RL算法应用到比较复杂的环境中时,为了恢复这种保证:(一) 可及性,(二) 安全受限制的可及性,或(三) 折扣奖励目标,我们利用Gelada等人引进的DeepMDP框架,在未知环境与所学的离散潜伏模型之间产生新的平衡界限。我们的平衡界限使得能够对Markov决策程序采用正式的方法。最后,我们展示了如何利用通过最先进的RL获得的政策来有效训练可变式自动电解的模型,产生离散的潜伏模型,并有近似准确的刺激保证。此外,我们获得了潜伏模型政策的最新版本。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于f-x域正则化自回归的非稳态地震数据重建和噪声衰减研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

高稳定性有序介孔尖晶石AFe2O4(A=Zn,Cu,Co,Ni)的可控制备及可见光催化分解水制氢研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于附加水动力振动反演的深海平台半隐耦合方法及试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于四嗪骨架的多孔共价有机框架的合成与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

生物可降解性多模态纳米微粒构建与TIMP-2、Endostatin联合靶向转运抑制动脉粥样硬化易损斑块血管发生的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Ni3Al基合金单晶生长规律研究

国家自然科学基金

0+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

穿膜肽Penetratin及其衍生物的解离动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

$A Particle-Based Algorithm for Distributional Optimization on \textit{Constrained Domains} via Variational Transport and Mirror Descent$

A Particle-Based Algorithm for Distributional Optimization on \textit{Constrained Domains} via Variational Transport and Mirror Descent

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Stable and Interpretable Unrolled Dictionary Learning

Arxiv

0+阅读 · 2022年8月2日

How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies

Arxiv

0+阅读 · 2022年8月2日

Evolution of Popularity Bias: Empirical Study and Debiasing

Arxiv

0+阅读 · 2022年8月1日

Learning to Control DC Motor for Micromobility in Real Time with Reinforcement Learning

Arxiv

0+阅读 · 2022年7月30日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Asymptotic Consistency for Nonconvex Risk-Averse Stochastic Optimization with Infinite Dimensional Decision Spaces

Arxiv

0+阅读 · 2022年7月29日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

【新书】人工智能Python代码，227页pdf，Python code for Artificial Intelligence: Foundations of Computational Agents

专知会员服务

103+阅读 · 2020年6月21日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】面向开放式世界的鲁棒智能体

美空军如何利用人工智能提升其兵棋推演能力

【AAAI2026】NeSTR：一种用于大型语言模型的神经-符号可溯因框架，用于时间推理

深度强化学习与模仿学习导论

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

$A Particle-Based Algorithm for Distributional Optimization on \textit{Constrained Domains} via Variational Transport and Mirror Descent$

A Particle-Based Algorithm for Distributional Optimization on \textit{Constrained Domains} via Variational Transport and Mirror Descent

Arxiv

0+阅读 · 2022年8月3日

Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization

Arxiv

0+阅读 · 2022年8月2日

Stable and Interpretable Unrolled Dictionary Learning

Arxiv

0+阅读 · 2022年8月2日

How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies

Arxiv

0+阅读 · 2022年8月2日

Evolution of Popularity Bias: Empirical Study and Debiasing

Arxiv

0+阅读 · 2022年8月1日

Learning to Control DC Motor for Micromobility in Real Time with Reinforcement Learning

Arxiv

0+阅读 · 2022年7月30日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Arxiv

0+阅读 · 2022年7月30日

Asymptotic Consistency for Nonconvex Risk-Averse Stochastic Optimization with Infinite Dimensional Decision Spaces

Arxiv

0+阅读 · 2022年7月29日

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Arxiv

33+阅读 · 2022年1月11日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

相关基金

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

基于f-x域正则化自回归的非稳态地震数据重建和噪声衰减研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

高稳定性有序介孔尖晶石AFe2O4(A=Zn,Cu,Co,Ni)的可控制备及可见光催化分解水制氢研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于附加水动力振动反演的深海平台半隐耦合方法及试验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于四嗪骨架的多孔共价有机框架的合成与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

生物可降解性多模态纳米微粒构建与TIMP-2、Endostatin联合靶向转运抑制动脉粥样硬化易损斑块血管发生的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Ni3Al基合金单晶生长规律研究

国家自然科学基金

0+阅读 · 2009年12月31日

Unscented卡尔曼滤波算法及其在通信中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

穿膜肽Penetratin及其衍生物的解离动力学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员