拥有线性奖赏和侧信息的无休无止隐藏 Markov Bandit (The Restless Hidden Markov Bandit with Linear Rewards and Side Information) - 专知论文

会员服务 ·

0

INFORMS · 赌博机/老虎机 · 线性的 · Bandits · ARM ·

2021 年 1 月 21 日

The Restless Hidden Markov Bandit with Linear Rewards and Side Information

翻译：拥有线性奖赏和侧信息的无休无止隐藏 Markov Bandit

Michal Yemini,Amir Leshem,Anelia Somekh-Baruch

from arxiv, Accepted for publication in the IEEE Transactions on Signal Processing. A summary of the results presented in this paper was accepted to the 59th Conference on Decision and Control (CDC 2020)

In this paper we present a model for the hidden Markovian bandit problem with linear rewards. As opposed to current work on Markovian bandits, we do not assume that the state is known to the decision maker before making the decision. Furthermore, we assume structural side information where the decision maker knows in advance that there are two types of hidden states; one is common to all arms and evolves according to a Markovian distribution, and the other is unique to each arm and is distributed according to an i.i.d. process that is unique to each arm. We present an algorithm and regret analysis to this problem. Surprisingly, we can recover the hidden states and maintain logarithmic regret in the case of a convex polytope action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action space. Therefore, we obtain practical solutions even in high dimensional problems.

翻译：在本文中,我们展示了隐蔽的马尔科维亚盗匪问题的模型和线性奖赏。与目前关于马尔科维亚土匪的工作相比,我们并不认为决策者在作出决定之前知道国家。此外,我们假设了结构侧信息,决策者事先知道有两种类型的隐藏状态;一个是所有武器共有的,根据马克沃维亚分布而演变,另一个是每个手臂独有的,根据每个手臂独有的i. i.d. 进程进行分配。我们对此问题进行了算法和遗憾分析。令人惊讶的是,我们可以恢复隐藏的状态,并保持对立式的遗憾,以组合组合组合为例。此外,我们表明结构侧信息会导致预期的遗憾,而这种遗憾并不取决于行动空间的极端点数。因此,我们获得实用的解决方案,即使是在高维度问题中。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

85+阅读 · 2020年6月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

128+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

66+阅读 · 2020年3月28日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

76+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

179+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

57+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

176+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Enumeration of Unordered Forests

Arxiv

0+阅读 · 2021年3月16日

SoK: Privacy-Preserving Collaborative Tree-based Model Learning

Arxiv

0+阅读 · 2021年3月16日

Information Geometry Approach to Parameter Estimation in Hidden Markov Models

Arxiv

0+阅读 · 2021年3月16日

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Arxiv

0+阅读 · 2021年3月15日

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Arxiv

0+阅读 · 2021年3月15日

Optimal Dual Schemes for Adaptive Grid Based Hexmeshing

Arxiv

0+阅读 · 2021年3月13日

On Incorporating Forecasts into Linear State Space Model Markov Decision Processes

Arxiv

0+阅读 · 2021年3月12日

Optimal sequential decision making with probabilistic digital twins

Arxiv

0+阅读 · 2021年3月12日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

85+阅读 · 2020年6月23日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

128+阅读 · 2020年4月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

66+阅读 · 2020年3月28日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

76+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

179+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

47+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

34+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

57+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

176+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

103+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《探索大型语言模型在军事联盟网络红队中的应用潜力》最新论文

《美国防部测试和评估总规划以及测试和评估战略》最新41页指令

使用大语言模型保护卫星免受攻击

《俄乌无人机战争的第二年：美陆军启示》最新报告

相关资讯

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Enumeration of Unordered Forests

Arxiv

0+阅读 · 2021年3月16日

SoK: Privacy-Preserving Collaborative Tree-based Model Learning

Arxiv

0+阅读 · 2021年3月16日

Information Geometry Approach to Parameter Estimation in Hidden Markov Models

Arxiv

0+阅读 · 2021年3月16日

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Arxiv

0+阅读 · 2021年3月15日

Reinforcement Learning with Algorithms from Probabilistic Structure Estimation

Arxiv

0+阅读 · 2021年3月15日

Optimal Dual Schemes for Adaptive Grid Based Hexmeshing

Arxiv

0+阅读 · 2021年3月13日

On Incorporating Forecasts into Linear State Space Model Markov Decision Processes

Arxiv

0+阅读 · 2021年3月12日

Optimal sequential decision making with probabilistic digital twins

Arxiv

0+阅读 · 2021年3月12日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员