Markov决策程序中的Bayesian对学习盗匪结构的处理方法 (A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes) - 专知论文

会员服务 ·

0

Learning · Markov · 赌博机/老虎机 · 回合 · Processing（编程语言） ·

2022 年 7 月 30 日

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

翻译：Markov决策程序中的Bayesian对学习盗匪结构的处理方法

Kelly W. Zhang,Omer Gottesman,Finale Doshi-Velez

from arxiv, Challenges of Real-World Reinforcement Learning 2020 (NeurIPS Workshop)

In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with domain expertise, it is often difficult to know whether it is appropriate to treat a sequential decision making problem as a CB or an MDP. In other words, do actions affect future states, or only the immediate rewards? Making the wrong assumption regarding the nature of the environment can lead to inefficient learning, or even prevent the algorithm from ever learning an optimal policy, even with infinite data. In this work we develop an online algorithm that uses a Bayesian hypothesis testing approach to learn the nature of the environment. Our algorithm allows practitioners to incorporate prior knowledge about whether the environment is that of a CB or an MDP, and effectively interpolate between classical CB and MDP-based algorithms to mitigate against the effects of misspecifying the environment. We perform simulations and demonstrate that in CB settings our algorithm achieves lower regret than MDP-based algorithms, while in non-bandit MDP settings our algorithm is able to learn the optimal policy, often achieving comparable regret to MDP-based algorithms.

翻译：在强化学习文献中,环境强盗(CB)或Markov决定程序(MDP)环境都有许多演算法。然而,在现实世界中部署强化学习算法时,即使有域内的专门知识,也往往难以知道将顺序决策问题作为CB或MDP处理是否适当。换句话说,行动是否影响未来国家,还是只直接产生直接的回报?对环境性质作出错误的假设可能导致低效学习,甚至甚至阻止算法学习最佳政策,即使有无限的数据。在这项工作中,我们开发了一种在线算法,使用巴伊西亚假设测试法来学习环境的性质。我们的算法允许从业者事先了解环境是CB还是MDP,并有效地将经典CB和MDP的算法结合起来,以减轻对环境的误判影响。我们进行模拟,并证明在CBB环境中,我们的算法比MDP的算法更低的遗憾,而在非bidit MDP环境中,我们的算法能够学习最优的政策,常常取得可比较的MDP。

0

相关内容

Learning

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

非线性热动力系统在Neumann边界控制下的整体解

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于时间序列特征的金融资产相依结构模型构建及应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于数据挖掘的故障诊断算法

国家自然科学基金

0+阅读 · 2012年12月31日

社会网络中个性化隐私保护研究

国家自然科学基金

2+阅读 · 2011年12月31日

LSCLS与DCG协同靶向治疗非小细胞肺癌研究

国家自然科学基金

0+阅读 · 2011年12月31日

Cortactin/actin介导幽门螺杆菌VacA转运至线粒体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年9月29日

Differentiable and Transportable Structure Learning

Arxiv

0+阅读 · 2022年9月29日

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

Arxiv

0+阅读 · 2022年9月28日

Constraint-Based Causal Structure Learning from Undersampled Graphs

Arxiv

0+阅读 · 2022年9月28日

Learning When to Advise Human Decision Makers

Learning When to Advise Human Decision Makers

Arxiv

0+阅读 · 2022年9月27日

A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models

A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models

Arxiv

0+阅读 · 2022年9月27日

Learning with Subset Stacking

Arxiv

0+阅读 · 2022年9月27日

Collaborative Decision Making Using Action Suggestions

Arxiv

0+阅读 · 2022年9月27日

Intention Communication and Hypothesis Likelihood in Game-Theoretic Motion Planning

Arxiv

0+阅读 · 2022年9月26日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

VIP会员

文章信息

相关主题

赌博机/老虎机

Processing（编程语言）

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

Arxiv

0+阅读 · 2022年9月29日

Differentiable and Transportable Structure Learning

Arxiv

0+阅读 · 2022年9月29日

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation

Arxiv

0+阅读 · 2022年9月28日

Constraint-Based Causal Structure Learning from Undersampled Graphs

Arxiv

0+阅读 · 2022年9月28日

Learning When to Advise Human Decision Makers

Learning When to Advise Human Decision Makers

Arxiv

0+阅读 · 2022年9月27日

A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models

A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models

Arxiv

0+阅读 · 2022年9月27日

Learning with Subset Stacking

Arxiv

0+阅读 · 2022年9月27日

Collaborative Decision Making Using Action Suggestions

Arxiv

0+阅读 · 2022年9月27日

Intention Communication and Hypothesis Likelihood in Game-Theoretic Motion Planning

Arxiv

0+阅读 · 2022年9月26日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

食源性致病菌的高灵敏SERS光谱分析方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

函数空间、几何和Mahler测度

国家自然科学基金

0+阅读 · 2014年12月31日

非线性热动力系统在Neumann边界控制下的整体解

国家自然科学基金

0+阅读 · 2014年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于时间序列特征的金融资产相依结构模型构建及应用研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于数据挖掘的故障诊断算法

国家自然科学基金

0+阅读 · 2012年12月31日

社会网络中个性化隐私保护研究

国家自然科学基金

2+阅读 · 2011年12月31日

LSCLS与DCG协同靶向治疗非小细胞肺癌研究

国家自然科学基金

0+阅读 · 2011年12月31日

Cortactin/actin介导幽门螺杆菌VacA转运至线粒体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员