多目标强化学习中的 Pareto-有效决策规模扩展：离线多目标强化学习 (Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL) - 专知论文

会员服务 ·

0

多目标强化学习 · 多目标 · 强化学习 · 滚动优化 · 数据集 ·

2023 年 4 月 30 日

Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL

翻译：多目标强化学习中的 Pareto-有效决策规模扩展：离线多目标强化学习

Baiting Zhu,Meihua Dang,Aditya Grover

from arxiv, Published in ICLR 2023

The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives. In practice, an agent's preferences over the objectives may not be known apriori, and hence, we require policies that can generalize to arbitrary preferences at test time. In this work, we propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent using only a finite dataset of offline demonstrations of other agents and their preferences. The key contributions of this work are two-fold. First, we introduce D4MORL, (D)atasets for MORL that are specifically designed for offline settings. It contains 1.8 million annotated demonstrations obtained by rolling out reference policies that optimize for randomly sampled preferences on 6 MuJoCo environments with 2-3 objectives each. Second, we propose Pareto-Efficient Decision Agents (PEDA), a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy. Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.

翻译：---- 本文的目标是在多个竞争性目标上学习同时优化的策略。在实践中，我们不一定了解代理的优先事项，因此，我们需要能够在测试时推广到任意优先事项的策略。在这项工作中，我们提出了一种新的数据驱动离线多目标强化学习设置，我们希望仅使用其他代理和其偏好的有限数据集来学习无偏好的策略代理。这项工作的主要贡献有两个。首先，我们介绍了 D4MORL，专门设计用于离线设置的(MORL)数据集。它包含 1.8 百万个注释示范，这些示范通过滚动优化 6 个 MuJoCo 环境中每个环境的 2-3 个目标的参考策略来获得。其次，我们提出了 Pareto-Efficient Decision Agents（PEDA），一种离线多目标强化学习算法。它通过一种新颖的以偏好和回报条件化的策略来构建和扩展 Decision Transformers。实证上，我们展示了 PEDA 在 D4MORL 基准测试中紧密逼近行为策略，并且在适当的条件下通过体积和稀疏度度量提供 Pareto 前沿的优秀逼近。

0

相关内容

多目标强化学习

多目标强化学习

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

专知会员服务

118+阅读 · 2022年3月18日

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

专知会员服务

25+阅读 · 2022年3月16日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

RL解决'BipedalWalkerHardcore-v2' (SOTA)效率效果均第一及完整源代码

RL解决'BipedalWalkerHardcore-v2' (SOTA)效率效果均第一及完整源代码

CreateAMind

19+阅读 · 2019年7月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

排序管理的帕累托优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

考虑零售商投资行为的动态定价策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于社会化行为的多机器人群体协作机制与动态决策方法研究

国家自然科学基金

4+阅读 · 2012年12月31日

种子优化算法及其在动态优化问题求解中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于行为决策理论的隐性目标决策模型与方法研究

国家自然科学基金

5+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

Non-RIP约束的非凸压缩感知方法研究与应用

国家自然科学基金

0+阅读 · 2011年12月31日

排序问题的博弈分析和多目标排序

国家自然科学基金

1+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

Arxiv

0+阅读 · 2023年6月15日

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Arxiv

0+阅读 · 2023年6月14日

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

Arxiv

0+阅读 · 2023年6月13日

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Arxiv

0+阅读 · 2023年6月13日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Optimizing Reusable Knowledge for Continual Learning via Metalearning

Arxiv

15+阅读 · 2021年6月9日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

多目标强化学习

相关VIP内容

JCIM丨DRlinker：深度强化学习优化片段连接设计

JCIM丨DRlinker：深度强化学习优化片段连接设计

专知会员服务

7+阅读 · 2022年12月9日

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

67+阅读 · 2022年3月29日

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

【多目标多智能体系统决策】196页PDF布鲁塞尔自由大学博士论文，Decision Making in Multi-Objective Multi-Agent Systems——A Utility-Based Perspective

专知会员服务

118+阅读 · 2022年3月18日

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

【布朗大学David Abel博士论文】A Theory of Abstraction in Reinforcement Learning

专知会员服务

25+阅读 · 2022年3月16日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

RL解决'LunarLander-v2' (SOTA)

RL解决'LunarLander-v2' (SOTA)

CreateAMind

62+阅读 · 2019年9月27日

RL解决'BipedalWalkerHardcore-v2' (SOTA)效率效果均第一及完整源代码

RL解决'BipedalWalkerHardcore-v2' (SOTA)效率效果均第一及完整源代码

CreateAMind

19+阅读 · 2019年7月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control

Arxiv

0+阅读 · 2023年6月15日

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

Arxiv

0+阅读 · 2023年6月14日

DenseLight: Efficient Control for Large-scale Traffic Signals with Dense Feedback

Arxiv

0+阅读 · 2023年6月13日

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Arxiv

0+阅读 · 2023年6月13日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

45+阅读 · 2022年8月2日

Medical Visual Question Answering: A Survey

Arxiv

15+阅读 · 2021年11月19日

Optimizing Reusable Knowledge for Continual Learning via Metalearning

Arxiv

15+阅读 · 2021年6月9日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

基于重要性采样的并行离策略强化学习方法研究

国家自然科学基金

23+阅读 · 2015年12月31日

排序管理的帕累托优化问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

考虑零售商投资行为的动态定价策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于社会化行为的多机器人群体协作机制与动态决策方法研究

国家自然科学基金

4+阅读 · 2012年12月31日

种子优化算法及其在动态优化问题求解中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于行为决策理论的隐性目标决策模型与方法研究

国家自然科学基金

5+阅读 · 2012年12月31日

Multi-Agent架构智能机器人推理机实时性研究

国家自然科学基金

1+阅读 · 2011年12月31日

Non-RIP约束的非凸压缩感知方法研究与应用

国家自然科学基金

0+阅读 · 2011年12月31日

排序问题的博弈分析和多目标排序

国家自然科学基金

1+阅读 · 2009年12月31日

基于多智能体强化学习的多机器人系统研究

国家自然科学基金

48+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员