具有不完全观察力的无休息性强盗类类的Whittle指数 (Whittle Index for A Class of Restless Bandits with Imperfect Observations) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Performer · 状态空间 · Processing（编程语言） · 类别 ·

2021 年 8 月 9 日

Whittle Index for A Class of Restless Bandits with Imperfect Observations

翻译：具有不完全观察力的无休息性强盗类类的Whittle指数

Keqin Liu,Ting Wu

We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. In our model, there are $N$ independent $2$-state Markov processes that may be observed and accessed for accruing rewards. The observation is error-prone, i.e., both false alarm and miss detection may happen. Furthermore, the user can only choose a subset of $M~(M<N)$ processes to observe at each discrete time. If a process in state~$1$ is correctly observed, then it will offer some reward. Due to the partial and imperfect observation model, the system is formulated as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with finite state spaces are PSPACE-HARD in general. In this paper, we establish a low-complexity algorithm that achieves a strong performance for this class of restless bandits. Under certain conditions, we theoretically prove the existence (indexability) of Whittle index and its equivalence to our algorithm. When those conditions do not hold, we show by numerical experiments the near-optimal performance of our algorithm in general.

翻译：我们考虑的是一组无休止的土匪问题,这些问题在随机优化、强化学习和操作研究中发现一个广泛的应用领域。在我们的模式中,有2美元独立的马可夫进程可以被观察和获取以获得回报。观察是容易出错的, 也就是说, 错误的警报和误测可能发生。此外, 用户只能在每一个离散的时间里选择一个子集的 $M~ (M <N) 程序来观察。如果一个进程在州里遵守了 $1, 那么它将提供一些奖励。由于部分和不完善的观察模型, 系统被设计成一个无休止的多臂强盗问题, 其信息空间是不可计数的基点。与有限的州空间无休止的强盗问题是一般的PACACE- HARD。在本文中, 我们建立一个低相容的算法, 能为这种无休眠的强盗带来强的性能。在某些条件下, 我们理论上证明惠特尔指数的存在( 指数) 及其与我们算法的等同性。当这些条件不能维持时, 我们通过数字实验来展示我们一般的接近的算法。

0

相关内容

赌博机/老虎机

赌博机/老虎机

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

An Improved Penalty Algorithm using Model Order Reduction for MIPDECO problems with partial observations

Arxiv

0+阅读 · 2021年10月7日

Gradient Importance Learning for Incomplete Observations

Arxiv

0+阅读 · 2021年10月6日

Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, Part I: the linear setting

Arxiv

0+阅读 · 2021年10月5日

Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, Part II: nonlinearities and DAEs

Arxiv

0+阅读 · 2021年10月5日

Contextual Combinatorial Volatile Bandits via Gaussian Processes

Arxiv

0+阅读 · 2021年10月5日

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Arxiv

0+阅读 · 2021年10月5日

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Arxiv

0+阅读 · 2021年10月4日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Approximability of Discriminators Implies Diversity in GANs

Approximability of Discriminators Implies Diversity in GANs

Arxiv

4+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

赌博机/老虎机

Processing（编程语言）

相关VIP内容

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

最新《非光滑优化》十讲硬核课程，剑桥大学梁经纬博士主讲

专知会员服务

33+阅读 · 2020年8月14日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

【AAMSA 2019 | tutorial】多智能体系统中的认知推理Epistemic Reasoning In Multiagent Systems ,法国雷恩François Schwarzentruber

专知会员服务

24+阅读 · 2019年5月14日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

An Improved Penalty Algorithm using Model Order Reduction for MIPDECO problems with partial observations

Arxiv

0+阅读 · 2021年10月7日

Gradient Importance Learning for Incomplete Observations

Arxiv

0+阅读 · 2021年10月6日

Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, Part I: the linear setting

Arxiv

0+阅读 · 2021年10月5日

Fast solution of fully implicit Runge-Kutta and discontinuous Galerkin in time for numerical PDEs, Part II: nonlinearities and DAEs

Arxiv

0+阅读 · 2021年10月5日

Contextual Combinatorial Volatile Bandits via Gaussian Processes

Arxiv

0+阅读 · 2021年10月5日

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

Arxiv

0+阅读 · 2021年10月5日

Sample-Optimal PAC Learning of Halfspaces with Malicious Noise

Arxiv

0+阅读 · 2021年10月4日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Learning to Importance Sample in Primary Sample Space

Learning to Importance Sample in Primary Sample Space

Arxiv

5+阅读 · 2018年8月23日

Approximability of Discriminators Implies Diversity in GANs

Approximability of Discriminators Implies Diversity in GANs

Arxiv

4+阅读 · 2018年6月27日

微信扫码咨询专知VIP会员