反调反调训训训训 (Inverse Constrained Reinforcement Learning) - 专知论文

会员服务 ·

0

奖励函数 · 学成 · 约束 · 泛函 · 强化学习 ·

2020 年 11 月 24 日

Inverse Constrained Reinforcement Learning

翻译：反调反调训训训训

Usman Anwar,Shehryar Malik,Alireza Aghasi,Ali Ahmed

Standard reinforcement learning (RL) algorithms train agents to maximize given reward functions. However, many real-world applications of RL require agents to also satisfy certain constraints which may, for example, be motivated by safety concerns. Constrained RL algorithms approach this problem by training agents to maximize given reward functions while respecting \textit{explicitly} defined constraints. However, in many cases, manually designing accurate constraints is a challenging task. In this work, given a reward function and a set of demonstrations from an expert that maximizes this reward function while respecting \textit{unknown} constraints, we propose a framework to learn the most likely constraints that the expert respects. We then train agents to maximize the given reward function subject to the learned constraints. Previous works in this regard have either mainly been restricted to tabular settings or specific types of constraints or assume knowledge of transition dynamics of the environment. In contrast, we empirically show that our framework is able to learn arbitrary \textit{Markovian} constraints in high-dimensions in a model-free setting.

翻译：标准强化学习(RL)算法培训代理员以最大限度地发挥奖励功能。然而,许多实际应用RL要求代理员也满足某些可能出于安全考虑的制约因素。培训代理商通过约束RL算法解决这一问题,最大限度地发挥给予的奖励功能,同时尊重定义的制约因素。然而,在许多情况下,手工设计准确的制约因素是一项艰巨的任务。在这项工作中,鉴于奖励功能和专家的一组示范,在尊重\textit{未知的}制约因素的同时最大限度地发挥这一奖励功能,我们提议了一个框架,以了解专家最可能尊重的制约因素。然后,我们培训代理商在学习到的限制条件下最大限度地发挥给定的奖励功能。这方面的以往工作主要局限于表格设置或特定类型的制约因素,或承担环境转型动态的知识。相比之下,我们的经验显示,我们的框架能够在无模式环境下学习高差异中的任意的 textit{Markovian}限制。

0

相关内容

奖励函数

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

MATLAB玩转深度学习？新书「MATLAB Deep Learning」162页pdf

MATLAB玩转深度学习？新书「MATLAB Deep Learning」162页pdf

专知会员服务

103+阅读 · 2020年1月13日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Partially Observable Mean Field Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

Consistency analysis of bilevel data-driven learning in inverse problems

Consistency analysis of bilevel data-driven learning in inverse problems

Arxiv

0+阅读 · 2021年1月7日

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

Model-Based Inverse Reinforcement Learning from Visual Demonstrations

Arxiv

0+阅读 · 2021年1月6日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

相关VIP内容

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

MATLAB玩转深度学习？新书「MATLAB Deep Learning」162页pdf

MATLAB玩转深度学习？新书「MATLAB Deep Learning」162页pdf

专知会员服务

103+阅读 · 2020年1月13日

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型基准综述

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

【剑桥博士论文】多智能体学习中的神经多样性

以色列-伊朗空战：短暂而激烈冲突的启示

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Partially Observable Mean Field Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

CoachNet: An Adversarial Sampling Approach for Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

Consistency analysis of bilevel data-driven learning in inverse problems

Consistency analysis of bilevel data-driven learning in inverse problems

Arxiv

0+阅读 · 2021年1月7日

Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement Learning

Arxiv

0+阅读 · 2021年1月7日

Model-Based Inverse Reinforcement Learning from Visual Demonstrations

Arxiv

0+阅读 · 2021年1月6日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员