Wasserstein自编码MDPs: 多面保证下高效提取强化学习策略的正式验证 (Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees) - 专知论文

会员服务 ·

0

蒸馏 · 潜在 · Learning · MoDELS · SimPLe ·

2023 年 3 月 22 日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

翻译：Wasserstein自编码MDPs: 多面保证下高效提取强化学习策略的正式验证

Florent Delgrange,Ann Nowé,Guillermo A. Pérez

from arxiv, ICLR 2023, 9 pages main text, 14 pages appendix (excluding references)

Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.

翻译：虽然深度强化学习（DRL）拥有许多成功案例，但是在安全关键场景下大规模部署通过这些先进技术学习到的策略受到正式保证的影响。变分马尔可夫决策过程（VAE-MDPs）是提供从任何RL策略中提取正式可验证控制器的可靠框架的离散潜变量空间模型。尽管相关的保证解决了实际方面的重要问题，如性能和安全性质的满足，但VAE方法存在一些学习缺陷（后验崩溃，学习速度慢，动态估计不良），主要是由于缺乏支持隐含优化的抽象和表示保证。我们引入了Wasserstein自编码MDP（WAE-MDP），这是一个潜在空间模型，通过最小化代表执行原始策略的代理和蒸馏策略之间的最优传输的惩罚形式来解决这些问题，其中正式保证适用。我们的方法在学习蒸馏策略的同时产生等价关系保证，允许具体优化表示模型的质量。我们的实验表明，除了提取策略的速度快了10倍，潜在模型质量总体上确实更好。此外，我们展示了一个基于简单时间-到-故障验证算法的潜在空间实验。我们的方法使得这样的简单验证技术成为可能，突显了其适用性。

0

相关内容

【ICML2023】序列反事实风险最小化

【ICML2023】序列反事实风险最小化

专知会员服务

21+阅读 · 2023年5月1日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

20+阅读 · 2020年1月7日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

马普与Google Brain新研究：Wasserstein自动编码器

马普与Google Brain新研究：Wasserstein自动编码器

论智

27+阅读 · 2018年2月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

TNF-α诱导鼻咽癌淋巴管生成和淋巴结转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

双目标排序的近似算法

国家自然科学基金

0+阅读 · 2013年12月31日

具有自回归条件异方差形式的模型下的期权定价

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

基于结构变形和人体损伤信息的大客车碰撞事故数值重构研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Efficient Dynamic Allocation Policy for Robust Ranking and Selection under Stochastic Control Framework

Arxiv

0+阅读 · 2023年5月12日

Unconditionally Secure Access Control Encryption

Arxiv

0+阅读 · 2023年5月12日

Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

Arxiv

0+阅读 · 2023年5月11日

An Offline Metric for the Debiasedness of Click Models

Arxiv

0+阅读 · 2023年5月11日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

A Survey on Multi-Task Learning

Arxiv

31+阅读 · 2021年3月29日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2023】序列反事实风险最小化

【ICML2023】序列反事实风险最小化

专知会员服务

21+阅读 · 2023年5月1日

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

康奈尔大学「深度概率与生成模型」2021SP课程

专知会员服务

49+阅读 · 2021年4月24日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

【SIGIR2020】策略感知的无偏排序学习—Top-K排序，Policy-Aware Unbiased Learning to Rank for Top-𝑘 Rankings

专知会员服务

27+阅读 · 2020年6月10日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

【强化学习论文推荐集合】2019年必读的10篇TOP强化学习论文，My Top 10 Deep RL Papers of 2019

专知会员服务

42+阅读 · 2020年1月15日

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

【独立研究者I-Sheng Yang论文】因果机器学习损失函数（A Loss-Function for Causal Machine-Learning）

专知会员服务

20+阅读 · 2020年1月7日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

【泡泡一分钟】FarSight：从户外图像中实现远距离深度估计

泡泡机器人SLAM

11+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

OpenAI官方发布：强化学习中的关键论文

OpenAI官方发布：强化学习中的关键论文

专知

14+阅读 · 2018年12月12日

马普与Google Brain新研究：Wasserstein自动编码器

马普与Google Brain新研究：Wasserstein自动编码器

论智

27+阅读 · 2018年2月10日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Efficient Dynamic Allocation Policy for Robust Ranking and Selection under Stochastic Control Framework

Arxiv

0+阅读 · 2023年5月12日

Unconditionally Secure Access Control Encryption

Arxiv

0+阅读 · 2023年5月12日

Decentralized Learning over Wireless Networks: The Effect of Broadcast with Random Access

Arxiv

0+阅读 · 2023年5月12日

Inapplicable Actions Learning for Knowledge Transfer in Reinforcement Learning

Arxiv

0+阅读 · 2023年5月11日

IVP-VAE: Modeling EHR Time Series with Initial Value Problem Solvers

Arxiv

0+阅读 · 2023年5月11日

An Offline Metric for the Debiasedness of Click Models

Arxiv

0+阅读 · 2023年5月11日

A Survey on Causal Reinforcement Learning

Arxiv

29+阅读 · 2023年2月10日

A Comprehensive Survey on Community Detection with Deep Learning

Arxiv

14+阅读 · 2021年5月26日

A Survey on Multi-Task Learning

Arxiv

31+阅读 · 2021年3月29日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

TNF-α诱导鼻咽癌淋巴管生成和淋巴结转移的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

双目标排序的近似算法

国家自然科学基金

0+阅读 · 2013年12月31日

具有自回归条件异方差形式的模型下的期权定价

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

非负二次函数锥规划研究

国家自然科学基金

0+阅读 · 2011年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

基于结构变形和人体损伤信息的大客车碰撞事故数值重构研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员