行为估计与多源数据离线强化学习 (Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

离线强化学习 · 多源 · 多源数据 · 强化学习 · 学习算法 ·

2023 年 4 月 11 日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

翻译：行为估计与多源数据离线强化学习

Guoxi Zhang,Hisashi Kashima

from arxiv, Accepted by AAAI 2023. Fixed errors in Fig. 4 presented in the camera-ready version and Table 1

Offline reinforcement learning (RL) have received rising interest due to its appealing data efficiency. The present study addresses behavior estimation, a task that lays the foundation of many offline RL algorithms. Behavior estimation aims at estimating the policy with which training data are generated. In particular, this work considers a scenario where the data are collected from multiple sources. In this case, neglecting data heterogeneity, existing approaches for behavior estimation suffers from behavior misspecification. To overcome this drawback, the present study proposes a latent variable model to infer a set of policies from data, which allows an agent to use as behavior policy the policy that best describes a particular trajectory. This model provides with a agent fine-grained characterization for multi-source data and helps it overcome behavior misspecification. This work also proposes a learning algorithm for this model and illustrates its practical usage via extending an existing offline RL algorithm. Lastly, with extensive evaluation this work confirms the existence of behavior misspecification and the efficacy of the proposed model.

翻译：离线强化学习由于其出色的数据效率而受到越来越多的关注。本研究探讨行为估计, 这是许多离线强化学习算法的基础。行为估计旨在估计生成训练数据的策略。具体而言, 本文考虑了从多个来源收集数据的情况。在这种情况下，忽略数据的异质性, 现有的行为截断方法受到了行为估计错误的影响。为了克服这一缺点，本研究提出了一个潜变量模型来推断一组政策，允许一个代理通过描述一个特定的轨迹最好的策略来使用行为策略。该模型为多源数据提供了代理精细的特征化，并帮助其克服行为估计错误。本文还提出了这个模型的学习算法，并通过扩展一个现有的离线强化学习算法来说明其实用性。最后，通过广泛的评估，本文验证了行为估计错误的存在和所提出模型的有效性。

0

相关内容

离线强化学习

离线强化学习

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Ornstein-Uhlenbeck 型过程多变点检验及两样本检验问题

国家自然科学基金

1+阅读 · 2015年12月31日

纵向数据的动态半参数建模及其统计推断

国家自然科学基金

0+阅读 · 2014年12月31日

判别式表观建模方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于溯源的高效智能的入侵检测与数据重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于数据驱动紧框架小波稀疏约束优化的地震数据重建

国家自然科学基金

0+阅读 · 2012年12月31日

基于纵向数据的秩回归和分位数回归的有效参数估计

国家自然科学基金

0+阅读 · 2012年12月31日

截面相依数据的建模、理论及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于多Agent的通信交互式动态影响图研究及应用

国家自然科学基金

2+阅读 · 2009年12月31日

汽车撞击时损伤的最小化

国家自然科学基金

0+阅读 · 2008年12月31日

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月29日

Fast Offline Policy Optimization for Large Scale Recommendation

Arxiv

0+阅读 · 2023年5月27日

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

A Data-driven Pricing Scheme for Optimal Routing through Artificial Currencies

Arxiv

0+阅读 · 2023年5月25日

Multi-behavior Self-supervised Learning for Recommendation

Arxiv

0+阅读 · 2023年5月22日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

离线强化学习

相关VIP内容

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

【AI+商业投资】法国兴业银行《深度强化学习在投资组合分配中的应用》26页PPT，Deep Reinforcement Learning for portfolio allocation

专知会员服务

24+阅读 · 2022年4月1日

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

【ToG 2021】强化学习中图像局部区域敏感的探索奖励，Deep Reinforcement Learning with Part-aware Exploration Bonus in Video Games

专知会员服务

16+阅读 · 2022年3月29日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

【ICML2020-上海交大】多智能体确定性Q-Learning， Multi-Agent Determinantal Q-Learning

专知会员服务

38+阅读 · 2020年6月3日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

Arxiv

0+阅读 · 2023年5月29日

Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月29日

Fast Offline Policy Optimization for Large Scale Recommendation

Arxiv

0+阅读 · 2023年5月27日

Sequence Modeling is a Robust Contender for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年5月26日

A Data-driven Pricing Scheme for Optimal Routing through Artificial Currencies

Arxiv

0+阅读 · 2023年5月25日

Multi-behavior Self-supervised Learning for Recommendation

Arxiv

0+阅读 · 2023年5月22日

A Survey on Reinforcement Learning for Recommender Systems

Arxiv

22+阅读 · 2021年9月22日

Coding for Distributed Multi-Agent Reinforcement Learning

Arxiv

32+阅读 · 2021年1月7日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

Ornstein-Uhlenbeck 型过程多变点检验及两样本检验问题

国家自然科学基金

1+阅读 · 2015年12月31日

纵向数据的动态半参数建模及其统计推断

国家自然科学基金

0+阅读 · 2014年12月31日

判别式表观建模方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于溯源的高效智能的入侵检测与数据重建方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非凸稀疏先验图像恢复建模理论和算法

国家自然科学基金

0+阅读 · 2012年12月31日

基于数据驱动紧框架小波稀疏约束优化的地震数据重建

国家自然科学基金

0+阅读 · 2012年12月31日

基于纵向数据的秩回归和分位数回归的有效参数估计

国家自然科学基金

0+阅读 · 2012年12月31日

截面相依数据的建模、理论及应用

国家自然科学基金

1+阅读 · 2012年12月31日

基于多Agent的通信交互式动态影响图研究及应用

国家自然科学基金

2+阅读 · 2009年12月31日

汽车撞击时损伤的最小化

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员