活动离线政策选择 (Active Offline Policy Selection) - 专知论文

会员服务 ·

0

INTERACT · 估计/估计量 · 回合 · 可辨认的 · 在线 ·

2021 年 12 月 3 日

Active Offline Policy Selection

翻译：活动离线政策选择

Ksenia Konyushkova,Yutian Chen,Tom Le Paine,Caglar Gulcehre,Cosmin Paduraru,Daniel J Mankowitz,Misha Denil,Nando de Freitas

from arxiv, Presented at NeurIPS 2021

This paper addresses the problem of policy selection in domains with abundant logged data, but with a restricted interaction budget. Solving this problem would enable safe evaluation and deployment of offline reinforcement learning policies in industry, robotics, and recommendation domains among others. Several off-policy evaluation (OPE) techniques have been proposed to assess the value of policies using only logged data. However, there is still a big gap between the evaluation by OPE and the full online evaluation in the real environment. Yet, large amounts of online interactions are often not possible in practice. To overcome this problem, we introduce \emph{active offline policy selection} - a novel sequential decision approach that combines logged data with online interaction to identify the best policy. This approach uses OPE estimates to warm start the online evaluation. Then, in order to utilize the limited environment interactions wisely we decide which policy to evaluate next based on a Bayesian optimization method with a kernel function that represents policy similarity. We use multiple benchmarks with a large number of candidate policies to show that the proposed approach improves upon state-of-the-art OPE estimates and pure online policy evaluation.

翻译：本文探讨了在有大量登录数据但互动预算有限的领域进行政策选择的问题。解决这个问题有助于在工业、机器人和建议等领域安全评估和部署离线强化学习政策。提出了几种离线评估技术,以利用仅登录数据评估政策的价值。然而,在OPE的评价和实际环境中的全面在线评价之间仍然存在着巨大差距。然而,大量在线互动在实践中往往无法实现。为了解决这一问题,我们引入了\emph{主动离线政策选择}——一种新的顺序决策方法,将登录的数据与在线互动结合起来,以确定最佳政策。这一方法利用OPE的估计来启动在线评价。然后,为了明智地利用有限的环境互动,我们决定了以体现政策相似性的贝耶斯优化方法来评估哪项政策。我们使用多种基准和大量候选政策来表明拟议方法改进了OPE的状态评估和纯粹的在线政策评估。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

122+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

浅谈主动学习（Active Learning）

浅谈主动学习（Active Learning）

凡人机器学习

32+阅读 · 2020年6月18日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Arxiv

0+阅读 · 2022年2月11日

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Arxiv

0+阅读 · 2022年2月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

Arxiv

4+阅读 · 2019年3月19日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Addressing the Item Cold-start Problem by Attribute-driven Active Learning

Arxiv

8+阅读 · 2018年5月23日

Deep Reinforcement Learning for Page-wise Recommendations

Arxiv

8+阅读 · 2018年5月7日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

98+阅读 · 2019年12月23日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

122+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】语言模型是高效的推理者吗？——来自逻辑编程的视角

美陆军在“艾布拉姆斯”坦克与“布拉德利”步战车上测试“牛蛙”反无人机炮塔

【剑桥大学博士论文】基于注意力的图表示学习

《深度文本哈希综述：基于二进制表示的高效语义文本检索》

相关资讯

浅谈主动学习（Active Learning）

浅谈主动学习（Active Learning）

凡人机器学习

32+阅读 · 2020年6月18日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

Arxiv

0+阅读 · 2022年2月11日

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Arxiv

0+阅读 · 2022年2月9日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

Arxiv

4+阅读 · 2019年3月19日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Addressing the Item Cold-start Problem by Attribute-driven Active Learning

Arxiv

8+阅读 · 2018年5月23日

Deep Reinforcement Learning for Page-wise Recommendations

Arxiv

8+阅读 · 2018年5月7日

Safety-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation

Arxiv

4+阅读 · 2018年1月29日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员