低复杂度对有不完美观察的无弹性强盗的低复杂度算法 (Low-Complexity Algorithm for Restless Bandits with Imperfect Observations) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Performer · 状态空间 · Processing（编程语言） · dynamic programming ·

2022 年 8 月 9 日

Low-Complexity Algorithm for Restless Bandits with Imperfect Observations

翻译：低复杂度对有不完美观察的无弹性强盗的低复杂度算法

Keqin Liu,Richard Weber,Ting Wu,Chengzhong Zhang

We consider a class of restless bandit problems that finds a broad application area in stochastic optimization, reinforcement learning and operations research. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue. The aim is to maximize the expected discounted sum of returns over the infinite horizon subject to a constraint that only $M$ $(<N)$ processes may be observed at each step. Observation is error-prone: there are known probabilities that state 1 (0) will be observed as 0 (1). From this one knows, at any time $t$, a probability that process $i$ is in state 1. The resulting system may be modeled as a restless multi-armed bandit problem with an information state space of uncountable cardinality. Restless bandit problems with even finite state spaces are PSPACE-HARD in general. We propose a novel approach for simplifying the dynamic programming equations of this class of restless bandits and develop a low-complexity algorithm that achieves a strong performance and is readily extensible to the general restless bandit model with observation errors. Under certain conditions, we establish the existence (indexability) of Whittle index and its equivalence to our algorithm. When those conditions do not hold, we show by numerical experiments the near-optimal performance of our algorithm in the general parametric space. Last, we theoretically prove the optimality of our algorithm for homogeneous systems.

翻译：我们考虑的是一类无休止的土匪问题,在随机优化、强化学习和业务研究中发现一个广泛的应用区。我们考虑的是独立的离散时间马可夫进程,每个进程都有两个可能的状态:1美元和0美元(“好”和“坏”)。只有当一个进程同时处于1国并被观察,才会有回报。我们的目标是在无限的地平线上最大限度地增加预期的回报折扣总和,但在每个步骤中只能观察到只有$(<N)美元的程序。观察是容易出错的:已知的概率是1(0)将观察到0(1)。我们从这里知道,随时可能有一个1美元(“好”和“坏”两个可能的状态)。因此产生的系统可能会被建成一个无休止的多架土匪问题,其信息空间是不可计数的。即使是有限的州平流空间也是一般的。我们提出了一种新颖的方法来简化这一无休止空间的马达1(0)状态的编程方程式方程式的概率,从任何时间里程观察,我们通过一个不易变的直径的直径的直径的直径直径的算法的算法。

0

相关内容

赌博机/老虎机

赌博机/老虎机

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

随机偏微分方程

国家自然科学基金

5+阅读 · 2017年12月31日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

SF3B1基因调节Bcl-x可变剪接参与骨髓增生异常综合征-RARS红系无效造血的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cklf1促血管平滑肌细胞增殖和迁移过程中信号传导机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化修饰调控COPD气道平滑肌细胞增殖及中药干预机制

国家自然科学基金

0+阅读 · 2011年12月31日

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Fast and Sample-Efficient Federated Low Rank Matrix Recovery from column-wise Linear and Quadratic Projections

Arxiv

0+阅读 · 2022年10月6日

Fisher information lower bounds for sampling

Arxiv

0+阅读 · 2022年10月5日

Stochastic coordinate transformations with applications to robust machine learning

Stochastic coordinate transformations with applications to robust machine learning

Arxiv

0+阅读 · 2022年10月5日

Application of Stable Inversion to Flexible Manipulators Modeled by the ANCF

Arxiv

0+阅读 · 2022年10月4日

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Arxiv

0+阅读 · 2022年10月4日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Improved lower and upper bounds for LCD codes

Arxiv

0+阅读 · 2022年10月1日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

VIP会员

文章信息

相关主题

赌博机/老虎机

Processing（编程语言）

dynamic programming

相关VIP内容

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

Arxiv

0+阅读 · 2022年10月6日

Fast and Sample-Efficient Federated Low Rank Matrix Recovery from column-wise Linear and Quadratic Projections

Arxiv

0+阅读 · 2022年10月6日

Fisher information lower bounds for sampling

Arxiv

0+阅读 · 2022年10月5日

Stochastic coordinate transformations with applications to robust machine learning

Stochastic coordinate transformations with applications to robust machine learning

Arxiv

0+阅读 · 2022年10月5日

Application of Stable Inversion to Flexible Manipulators Modeled by the ANCF

Arxiv

0+阅读 · 2022年10月4日

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

Arxiv

0+阅读 · 2022年10月4日

Lower Complexity Bounds of Finite-Sum Optimization Problems: The Results and Construction

Arxiv

0+阅读 · 2022年10月3日

Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient

Arxiv

0+阅读 · 2022年10月3日

Improved lower and upper bounds for LCD codes

Arxiv

0+阅读 · 2022年10月1日

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning

Arxiv

1+阅读 · 2022年9月30日

相关基金

随机偏微分方程

国家自然科学基金

5+阅读 · 2017年12月31日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

类泛素化修饰Neddylation在DNA损伤应答中的调控作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

Progranulin在糖尿病肾病足细胞损伤中的保护作用及分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

SF3B1基因调节Bcl-x可变剪接参与骨髓增生异常综合征-RARS红系无效造血的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cklf1促血管平滑肌细胞增殖和迁移过程中信号传导机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

低秩矩阵复原的Schatten-q(0<q<1)正则化理论与算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白乙酰化修饰调控COPD气道平滑肌细胞增殖及中药干预机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员