LoCA Regret:评价强化学习中基于示范行为的一致方法 (The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning) - 专知论文

会员服务 ·

0

学成 · 强化学习 · 可辨认的 · 优化器 · 样例 ·

2020 年 12 月 3 日

The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

翻译：LoCA Regret:评价强化学习中基于示范行为的一致方法

Harm van Seijen,Hadi Nekoei,Evan Racah,Sarath Chandar

from arxiv, NeurIPS 2020, code: https://github.com/chandar-lab/LoCA

Deep model-based Reinforcement Learning (RL) has the potential to substantially improve the sample-efficiency of deep RL. While various challenges have long held it back, a number of papers have recently come out reporting success with deep model-based methods. This is a great development, but the lack of a consistent metric to evaluate such methods makes it difficult to compare various approaches. For example, the common single-task sample-efficiency metric conflates improvements due to model-based learning with various other aspects, such as representation learning, making it difficult to assess true progress on model-based RL. To address this, we introduce an experimental setup to evaluate model-based behavior of RL methods, inspired by work from neuroscience on detecting model-based behavior in humans and animals. Our metric based on this setup, the Local Change Adaptation (LoCA) regret, measures how quickly an RL method adapts to a local change in the environment. Our metric can identify model-based behavior, even if the method uses a poor representation and provides insight in how close a method's behavior is from optimal model-based behavior. We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.

翻译：深层基于模型的强化学习(RL)具有大幅提高深层RL样本效率的潜力。尽管各种挑战长期阻碍着它,但最近一些论文以深层基于模型的方法报告了成功。这是一个巨大的发展,但由于缺乏评价这些方法的一致衡量标准,因此难以比较各种方法。例如,由于基于模型的学习,共同的单一任务抽样效率衡量标准组合与诸如代表性学习等其他各方面的学习而导致的改进,使得难以评估基于模型的RL的真实进展。为了解决这个问题,我们引入了一个实验性架构,以评估基于模型的RL方法的行为模式,这得益于神经科学关于发现人类和动物基于模型的行为的工作。我们基于这一设置的衡量标准,即地方变化适应(LOCA)遗憾,衡量RL方法如何迅速适应环境的当地变化。我们的衡量标准可以确定基于模型的行为,即使该方法使用不好的表述方法,并且能够洞察出方法的行为与基于模型的最佳行为是否接近。我们利用我们的设置来评估基于模型的MUZ模型的模型模型变化。我们利用我们的模型来评估基于模型的MUZ的模型的模型。

0

相关内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020-Oral-浙江大学】深度知识迁移的深度归因图，DEPARA: Deep Attribution Graph

【CVPR2020-Oral-浙江大学】深度知识迁移的深度归因图，DEPARA: Deep Attribution Graph

专知会员服务

27+阅读 · 2020年3月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Partially Observable Mean Field Reinforcement Learning

Arxiv

0+阅读 · 2021年1月24日

Hand-Based Person Identification using Global and Part-Aware Deep Feature Representation Learning

Arxiv

0+阅读 · 2021年1月19日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Improving Collaborative Metric Learning with Efficient Negative Sampling

Arxiv

3+阅读 · 2019年9月24日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Improving Online Multiple Object tracking with Deep Metric Learning

Arxiv

7+阅读 · 2018年6月20日

TVAE: Triplet-Based Variational Autoencoder using Metric Learning

Arxiv

3+阅读 · 2018年4月3日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

相关VIP内容

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020-Oral-浙江大学】深度知识迁移的深度归因图，DEPARA: Deep Attribution Graph

【CVPR2020-Oral-浙江大学】深度知识迁移的深度归因图，DEPARA: Deep Attribution Graph

专知会员服务

27+阅读 · 2020年3月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

【IJCAI 2019】人工智能中的认知推理（Epistemic reasoning in AI），法国雷恩François Schwarzentruber，Tristan Charrier

专知会员服务

22+阅读 · 2019年8月10日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

相关论文

Partially Observable Mean Field Reinforcement Learning

Arxiv

0+阅读 · 2021年1月24日

Hand-Based Person Identification using Global and Part-Aware Deep Feature Representation Learning

Arxiv

0+阅读 · 2021年1月19日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

Improving Collaborative Metric Learning with Efficient Negative Sampling

Arxiv

3+阅读 · 2019年9月24日

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning

Arxiv

4+阅读 · 2018年10月24日

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Deep Reinforcement Learning in Ice Hockey for Context-Aware Player Evaluation

Arxiv

5+阅读 · 2018年7月11日

Improving Online Multiple Object tracking with Deep Metric Learning

Arxiv

7+阅读 · 2018年6月20日

TVAE: Triplet-Based Variational Autoencoder using Metric Learning

Arxiv

3+阅读 · 2018年4月3日

Learning to Adapt: Meta-Learning for Model-Based Control

Arxiv

9+阅读 · 2018年3月30日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员