与生物模拟的可信赖的近似政策迭接 (Trusted Approximate Policy Iteration with Bisimulation Metrics) - 专知论文

会员服务 ·

0

策略迭代 · Continuity · Performer · 近似 · 值函数近似 ·

2022 年 2 月 6 日

Trusted Approximate Policy Iteration with Bisimulation Metrics

翻译：与生物模拟的可信赖的近似政策迭接

Mete Kemertas,Allan Jepson

Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation. In this work we first prove that bisimulation metrics can be defined via any $p$-Wasserstein metric for $p\geq 1$. Then we describe an approximate policy iteration (API) procedure that uses $\epsilon$-aggregation with $\pi$-bisimulation and prove performance bounds for continuous state spaces. We bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these theoretical results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. In addition, we propose a novel trust region approach which circumvents the requirement to explicitly solve a constrained optimization problem. Finally, we provide experimental evidence of improved stability compared to non-conservative alternatives in simulated continuous control.

翻译：根据对奖赏序列的比较,Bisimation 衡量标准在Markov 决策程序(MDP)各邦之间界定了距离。由于此属性, 它们提供了价值函数近似值的理论保证。在这项工作中,我们首先证明,通过任何P$-Wasserstein 衡量 $p\geq 1美元,可以确定闪烁量值。然后我们描述了一种大约的政策重复(API)程序,该程序使用$\epsilon-gregation, $\pi$- spection, 并证明连续国家空间的性能界限。我们从政策变化的角度将$\pi$- speimation 衡量标准加以约束。基于这些理论结果, 我们设计了一个API ($\alpha$) 程序, 采用保守的政策更新, 并比天真的API 方法有更好的性能约束。此外, 我们提出一种新的信任区域方法, 规避明确解决限制优化问题的要求。最后, 我们提供实验性证据表明, 相对于模拟持续控制中的非保护性替代方法, 更加稳定。

0

相关内容

策略迭代

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于无视觉码本框架的大规模图像检索研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于哈希的海量高维数据近似最近邻查询研究

国家自然科学基金

0+阅读 · 2014年12月31日

大跨屋盖结构风效应不确定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于描述逻辑及符号算法的事例相似性研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

可重写Petri网理论及在大规模动态分布式系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于响应的电网运行态势量化评估与自适应控制

国家自然科学基金

0+阅读 · 2012年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

面向无纸贸易的在线支付金融和税收协同监管研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Arxiv

1+阅读 · 2022年4月19日

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Arxiv

0+阅读 · 2022年4月18日

Dynamic Approximate Maximum Independent Set on Massive Graphs

Arxiv

0+阅读 · 2022年4月18日

Coalgebras for Bisimulation of Weighted Automata over Semirings

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Polynomial-time sparse measure recovery

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

VIP会员

文章信息

相关主题

值函数近似

相关VIP内容

【Google】梯度下降，48页ppt

【Google】梯度下降，48页ppt

专知会员服务

81+阅读 · 2020年12月5日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Adaptive measurement filter: efficient strategy for optimal estimation of quantum Markov chains

Arxiv

1+阅读 · 2022年4月19日

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Event-triggered Approximate Byzantine Consensus with Multi-hop Communication

Arxiv

0+阅读 · 2022年4月19日

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

Arxiv

1+阅读 · 2022年4月19日

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Arxiv

0+阅读 · 2022年4月18日

Dynamic Approximate Maximum Independent Set on Massive Graphs

Arxiv

0+阅读 · 2022年4月18日

Coalgebras for Bisimulation of Weighted Automata over Semirings

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

Polynomial-time sparse measure recovery

Arxiv

0+阅读 · 2022年4月16日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

Arxiv

0+阅读 · 2022年4月15日

相关基金

基于无视觉码本框架的大规模图像检索研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于哈希的海量高维数据近似最近邻查询研究

国家自然科学基金

0+阅读 · 2014年12月31日

大跨屋盖结构风效应不确定性研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于描述逻辑及符号算法的事例相似性研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀疏张量学习理论

国家自然科学基金

1+阅读 · 2012年12月31日

可重写Petri网理论及在大规模动态分布式系统中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于响应的电网运行态势量化评估与自适应控制

国家自然科学基金

0+阅读 · 2012年12月31日

并行数据和调查数据质量管理

国家自然科学基金

0+阅读 · 2011年12月31日

面向无纸贸易的在线支付金融和税收协同监管研究

国家自然科学基金

0+阅读 · 2009年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员