可缩放的安全-紧急政策评价,包括快速快速快速事件抽样 (Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling) - 专知论文

会员服务 ·

0

策略评估 · 估计/估计量 · 样本 · 有偏 · Learning ·

2022 年 10 月 2 日

Scalable Safety-Critical Policy Evaluation with Accelerated Rare Event Sampling

翻译：可缩放的安全-紧急政策评价,包括快速快速快速事件抽样

Mengdi Xu,Peide Huang,Fengpei Li,Jiacheng Zhu,Xuewei Qi,Kentaro Oguchi,Zhiyuan Huang,Henry Lam,Ding Zhao

from arxiv, 8 pages, 6 figures

Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence property of APE in the tabular setting. Our empirical studies show that APE can estimate the rare event probability with a smaller bias while only using orders of magnitude fewer samples than baselines in multi-agent and single-agent environments.

翻译：在获得可靠的强化学习政策方面的主要挑战之一是评估稀有但高吞量事件,特别是在大型或无限的州/行动空间,在这种地方,可缩放性有限导致测试迭代数量之多令人望而却步;另一方面,在安全临界系统中进行偏差或不准确的政策评价可能会在部署过程中造成意外的灾难性失败。本文件建议采用加速政策评价方法,同时发现稀有事件并估计Markov决策程序中的稀有事件概率。APE方法将环境性质作为对抗性代理人对待,并通过适应性重要性抽样,学习政策评价的零变化抽样分布。此外,APE通过吸收功能吸附器,可以伸缩到大型离散或连续的空间。我们调查表格设置中APE的趋同性。我们的经验研究表明,APE只使用比多试剂和单一试剂环境中的基线数量更少的样本来估计稀有事件概率,但偏差较小。

0

相关内容

策略评估

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Sema4D在肥胖诱导的脂肪炎症和胰岛素抵抗中的作用和机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

两相流场中舰船冲蚀和腐蚀耦合效应的摩擦学行为研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ni3Al基单晶合金中合金化元素行为及其对性能的作用机理

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CuAlNi合金中相变波与马氏体微结构的交互激励机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi表面激光Ag合金化及其对抗菌和耐蚀性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

松香改性壳聚糖阳离子表面活性剂合成及其构效关系

国家自然科学基金

0+阅读 · 2011年12月31日

高应变率下纯铜织构与绝热剪切敏感性的相关性研究

国家自然科学基金

0+阅读 · 2008年12月31日

Beyond IID: data-driven decision-making in heterogeneous environments

Arxiv

0+阅读 · 2022年11月7日

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Arxiv

0+阅读 · 2022年11月7日

Diagnostic Tool for Out-of-Sample Model Evaluation

Arxiv

0+阅读 · 2022年11月7日

What can the millions of random treatments in nonexperimental data reveal about causes?

Arxiv

0+阅读 · 2022年11月6日

Reliable Off-policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Dynamic Pricing for Non-fungible Resources: Designing Multidimensional Blockchain Fee Markets

Arxiv

0+阅读 · 2022年11月3日

Multi-vehicle Conflict Resolution in Highly Constrained Spaces by Merging Optimal Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年11月2日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Beyond IID: data-driven decision-making in heterogeneous environments

Arxiv

0+阅读 · 2022年11月7日

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Arxiv

0+阅读 · 2022年11月7日

Diagnostic Tool for Out-of-Sample Model Evaluation

Arxiv

0+阅读 · 2022年11月7日

What can the millions of random treatments in nonexperimental data reveal about causes?

Arxiv

0+阅读 · 2022年11月6日

Reliable Off-policy Evaluation for Reinforcement Learning

Arxiv

0+阅读 · 2022年11月3日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2022年11月3日

Dynamic Pricing for Non-fungible Resources: Designing Multidimensional Blockchain Fee Markets

Arxiv

0+阅读 · 2022年11月3日

Multi-vehicle Conflict Resolution in Highly Constrained Spaces by Merging Optimal Control and Reinforcement Learning

Arxiv

0+阅读 · 2022年11月2日

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Arxiv

13+阅读 · 2021年12月20日

Data-Free Knowledge Distillation for Heterogeneous Federated Learning

Arxiv

12+阅读 · 2021年6月9日

相关基金

Sema4D在肥胖诱导的脂肪炎症和胰岛素抵抗中的作用和机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

两相流场中舰船冲蚀和腐蚀耦合效应的摩擦学行为研究

国家自然科学基金

0+阅读 · 2014年12月31日

Ni3Al基单晶合金中合金化元素行为及其对性能的作用机理

国家自然科学基金

0+阅读 · 2014年12月31日

AMPK调控内质网应激抵抗COPD气道上皮细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

CuAlNi合金中相变波与马氏体微结构的交互激励机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

TiNi表面激光Ag合金化及其对抗菌和耐蚀性能的影响

国家自然科学基金

0+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

松香改性壳聚糖阳离子表面活性剂合成及其构效关系

国家自然科学基金

0+阅读 · 2011年12月31日

高应变率下纯铜织构与绝热剪切敏感性的相关性研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员