利用已积累的动态模型开展高效的基于优惠的强化学习 (Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models) - 专知论文

会员服务 ·

0

Learning · Performer · MoDELS · 回合 · 强化学习 ·

2023 年 1 月 11 日

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

翻译：利用已积累的动态模型开展高效的基于优惠的强化学习

Yi Liu,Gaurav Datta,Ellen Novoseller,Daniel S. Brown

from arxiv, NeurIPS 2022 Workshop on Human in the Loop Learning

Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction. Our paper provides empirical evidence that learned dynamics models enable robots to learn customized policies based on user preferences in ways that are safer and more sample efficient than prior preference learning approaches.

翻译：以优惠为基础的强化学习(PbRL)可以使机器人学会在个人偏好的基础上执行任务,而不需要手工制作的奖赏功能;然而,现有的方法要么假定可以使用高纤维模拟器或分析模型,要么采取无模式的办法,需要广泛的、可能不安全的在线环境互动;在本文中,我们研究在进行PbRL时使用学习的动态模型的好处和挑战。特别是,我们提供证据表明,一个学习的动态模型在执行PbRL时可以带来以下好处:(1) 优惠的吸引和政策优化需要比没有模型的PbRL少得多的环境互动;(2) 各种优惠询问可以安全有效地合成,作为基于标准模型的RL的副产品;(3) 以亚优化演示为基础的奖励性培训前工作可以在不进行任何环境互动的情况下进行;我们的文件提供了经验证据,表明学习的动态模型能够使机器人学习基于用户偏好的政策,其方式比先前的优惠学习方法更安全、更有效率。

0

相关内容

Learning

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Runge-Kutta间断Galerkin方法的各向异性自适应方法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

考虑气动力/热与结构不确定性的鲁棒气动弹性优化研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

Non-RIP约束的非凸压缩感知方法研究与应用

国家自然科学基金

0+阅读 · 2011年12月31日

Trb3在内质网应激诱导舌鳞癌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

Evolutionary Reinforcement Learning: A Survey

Arxiv

0+阅读 · 2023年3月7日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

Efficient Domain Coverage for Vehicles with Second-Order Dynamics via Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月6日

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月6日

Ensemble Reinforcement Learning: A Survey

Arxiv

0+阅读 · 2023年3月5日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2023年3月3日

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

【快讯】ICML 2020论文出炉，1088篇上榜，你的paper中了吗？

专知会员服务

52+阅读 · 2020年6月1日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Evolutionary Reinforcement Learning: A Survey

Arxiv

0+阅读 · 2023年3月7日

Sampling-based Exploration for Reinforcement Learning of Dexterous Manipulation

Arxiv

0+阅读 · 2023年3月6日

Efficient Domain Coverage for Vehicles with Second-Order Dynamics via Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月6日

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月6日

Ensemble Reinforcement Learning: A Survey

Arxiv

0+阅读 · 2023年3月5日

Hindsight States: Blending Sim and Real Task Elements for Efficient Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Handling Sparse Rewards in Reinforcement Learning Using Model Predictive Control

Arxiv

0+阅读 · 2023年3月3日

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Pretraining in Deep Reinforcement Learning: A Survey

Arxiv

21+阅读 · 2022年11月8日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

相关基金

Runge-Kutta间断Galerkin方法的各向异性自适应方法及其应用

国家自然科学基金

0+阅读 · 2012年12月31日

"β-hCG-ERK1/2-MMP-2"信号通路在卵巢癌侵袭、转移中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向气动CFD非线性求解的GPU/CPU混合并行JFNK算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

考虑气动力/热与结构不确定性的鲁棒气动弹性优化研究

国家自然科学基金

0+阅读 · 2011年12月31日

宽带放大器用Er3+/Ce3+共掺碲酸盐玻璃及光纤1.53μm波段辐射强度提高研究

国家自然科学基金

0+阅读 · 2011年12月31日

Non-RIP约束的非凸压缩感知方法研究与应用

国家自然科学基金

0+阅读 · 2011年12月31日

Trb3在内质网应激诱导舌鳞癌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员