PG3: 一般性政策制定的政策指导规划 (PG3: Policy-Guided Planning for Generalized Policy Generation) - 专知论文

会员服务 ·

0

评分函数 · 策略搜索 · 泛函 · 得分 · HTTPS ·

2022 年 4 月 21 日

PG3: Policy-Guided Planning for Generalized Policy Generation

翻译：PG3: 一般性政策制定的政策指导规划

Ryan Yang,Tom Silver,Aidan Curtis,Tomas Lozano-Perez,Leslie Pack Kaelbling

from arxiv, IJCAI 2022

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3

翻译：在这项工作中,我们研究通用政策搜索方法,重点是用于指导政策搜索的评分功能;我们展示了两个得分功能的局限性,并提出了克服这些局限性的新办法;我们的方法背后的主要理念是,政策指南规划促进普遍化的政策制定(PG3),即应使用候选政策指导培训问题规划,作为评估该候选人的一种机制;简化设置的理论结果为PG3提供了最佳或可接受的条件;然后,我们研究在规划问题以PDDL为基础和取消政策决定清单的情况下,具体地即时政策搜索;六个领域的实证结果证实,PG3学习的通用政策比几个基线更有效率和效力。守则:https://github.com/ryangpeixu/pg3。

0

相关内容

评分函数

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

新疆维吾尔族和汉族老年人群MCI的转归及其分子生物学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

客户需求变更持续响应下的产品配置更新建模、优化及其管理应用

国家自然科学基金

0+阅读 · 2012年12月31日

双极性树枝状蓝光PhOLED用Ir（Ⅲ）金属配合物的合成与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

CK2磷酸化抑制 TAp73促进骨肉瘤干细胞增殖的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Smiles重排串联反应的并三环杂环体系的构建

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

洋葱新病原细菌Burkholderia cenocepacia的致病基因及检测技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

STable: Table Generation Framework for Encoder-Decoder Models

Arxiv

0+阅读 · 2022年6月8日

Few-shot Prompting Toward Controllable Response Generation

Arxiv

0+阅读 · 2022年6月8日

Model Generation with Provable Coverability for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Arxiv

0+阅读 · 2022年6月7日

Position Paper: Online Modeling for Offline Planning

Arxiv

0+阅读 · 2022年6月7日

Iterative Depth-First Search for FOND Planning

Arxiv

0+阅读 · 2022年6月7日

Generalized Data Distribution Iteration

Arxiv

0+阅读 · 2022年6月7日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

【新书发布】原作者MarcG.Bellemare发布315页分布强化学习书籍(DistributionalRL)

深度强化学习实验室

1+阅读 · 2022年1月11日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

STable: Table Generation Framework for Encoder-Decoder Models

Arxiv

0+阅读 · 2022年6月8日

Few-shot Prompting Toward Controllable Response Generation

Arxiv

0+阅读 · 2022年6月8日

Model Generation with Provable Coverability for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年6月8日

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

Arxiv

0+阅读 · 2022年6月7日

Position Paper: Online Modeling for Offline Planning

Arxiv

0+阅读 · 2022年6月7日

Iterative Depth-First Search for FOND Planning

Arxiv

0+阅读 · 2022年6月7日

Generalized Data Distribution Iteration

Arxiv

0+阅读 · 2022年6月7日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

M2L2型水溶性金属-药物配合物的定向合成与抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

新疆维吾尔族和汉族老年人群MCI的转归及其分子生物学机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

客户需求变更持续响应下的产品配置更新建模、优化及其管理应用

国家自然科学基金

0+阅读 · 2012年12月31日

双极性树枝状蓝光PhOLED用Ir（Ⅲ）金属配合物的合成与性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

CK2磷酸化抑制 TAp73促进骨肉瘤干细胞增殖的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Smiles重排串联反应的并三环杂环体系的构建

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

洋葱新病原细菌Burkholderia cenocepacia的致病基因及检测技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员