显示您的离线强化学习工作:在线评价预算事项 (Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters) - 专知论文

会员服务 ·

0

Performer · 估计/估计量 · Performance · 在线 · 强化学习 ·

2021 年 10 月 8 日

Showing Your Offline Reinforcement Learning Work: Online Evaluation Budget Matters

翻译：显示您的离线强化学习工作:在线评价预算事项

Vladislav Kurenkov,Sergey Kolesnikov

Over the recent years, vast progress has been made in Offline Reinforcement Learning (Offline-RL) for various decision-making domains: from finance to robotics. However, comparing and reporting new Offline-RL algorithms has been noted as underdeveloped: (1) use of unlimited online evaluation budget for hyperparameter search (2) sidestepping offline policy selection (3) ad-hoc performance statistics reporting. In this work, we propose an evaluation technique addressing these issues, Expected Online Performance, that provides a performance estimate for a best-found policy given a fixed online evaluation budget. Using our approach, we can estimate the number of online evaluations required to surpass a given behavioral policy performance. Applying it to several Offline-RL baselines, we find that with a limited online evaluation budget, (1) Behavioral Cloning constitutes a strong baseline over various expert levels and data regimes, and (2) offline uniform policy selection is competitive with value-based approaches. We hope the proposed technique will make it into the toolsets of Offline-RL practitioners to help them arrive at informed conclusions when deploying RL in real-world systems.

翻译：近年来,从金融到机器人等不同决策领域的离线强化学习(离线-RL)取得了巨大进展。然而,比较和报告新的离线-RL算法被认为是欠发达的:(1) 使用无限制的在线评价预算进行超光谱搜索(2) 边距离线政策选择(3) 特别动态业绩统计报告。在这项工作中,我们提出了一个解决这些问题的评价技术,即预期在线业绩,根据固定的在线评价预算,为最完善的政策提供业绩估计。我们可以使用我们的方法,估计超过特定行为政策绩效所需的在线评价数量。我们发现,在有限的离线-RL基线下,我们发现在有限的在线评价预算下,(1) 行为性克隆是各种专家级别和数据制度的有力基线,(2) 离线性统一政策选择与基于价值的方法相比具有竞争力。我们希望,拟议的技术将将其纳入离线-RL从业人员的工具库,帮助他们在实际世界系统中部署RL时得出知情的结论。

0

相关内容

Performer

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【O'Reilly AI Conference 2019】高管简报：您会学中文以提高AI吗？（Executive Briefing: Will you learn Chinese to advance in AI?），Charlotte Han

【O'Reilly AI Conference 2019】高管简报：您会学中文以提高AI吗？（Executive Briefing: Will you learn Chinese to advance in AI?），Charlotte Han

专知会员服务

8+阅读 · 2019年11月5日

吴恩达新书《Machine Learning Yearning》完整中文版

吴恩达新书《Machine Learning Yearning》完整中文版

专知会员服务

147+阅读 · 2019年10月27日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2021年12月2日

A Survey of Generalisation in Deep Reinforcement Learning

Arxiv

4+阅读 · 2021年11月18日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

史上机器学习 &深度学习课程大合集，一站搞定，Deep Learning Drizzle

专知会员服务

175+阅读 · 2020年5月10日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【O'Reilly AI Conference 2019】高管简报：您会学中文以提高AI吗？（Executive Briefing: Will you learn Chinese to advance in AI?），Charlotte Han

【O'Reilly AI Conference 2019】高管简报：您会学中文以提高AI吗？（Executive Briefing: Will you learn Chinese to advance in AI?），Charlotte Han

专知会员服务

8+阅读 · 2019年11月5日

吴恩达新书《Machine Learning Yearning》完整中文版

吴恩达新书《Machine Learning Yearning》完整中文版

专知会员服务

147+阅读 · 2019年10月27日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

carla无人驾驶模拟中文项目 carla_simulator_Chinese

carla无人驾驶模拟中文项目 carla_simulator_Chinese

CreateAMind

3+阅读 · 2018年1月30日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Arxiv

0+阅读 · 2021年12月2日

IQ-Learn: Inverse soft-Q Learning for Imitation

Arxiv

0+阅读 · 2021年12月2日

A Survey of Generalisation in Deep Reinforcement Learning

Arxiv

4+阅读 · 2021年11月18日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Information-Directed Exploration for Deep Reinforcement Learning

Information-Directed Exploration for Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年12月18日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning

Arxiv

10+阅读 · 2018年6月28日

Do deep reinforcement learning agents model intentions?

Arxiv

5+阅读 · 2018年5月21日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员