NeurWIN:无休止强盗通过深RL的神经Whattle指数网络 (NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL) - 专知论文

会员服务 ·

0

赌博机/老虎机 · Performer · Networking · 优化器 · Bandits ·

2022 年 1 月 20 日

NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL

翻译：NeurWIN:无休止强盗通过深RL的神经Whattle指数网络

Khaled Nakhleh,Santosh Ganji,Ping-Chun Hsieh,I-Hong Hou,Srinivas Shakkottai

from arxiv, Accepted for publication in NeurIPS 2021

Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.

翻译：Whittle 指数政策是获得对臭名昭著的无动于衷的土匪问题无异的最佳最佳解决办法的有力工具。然而,找到Whittle指数对于许多实际的无动于衷的土匪来说仍然是一个棘手的问题。本文提议了NeurWIN,这是一个神经惠特尔指数网络,通过利用Whittle指数的数学特性,为任何无动于衷的土匪学习Whittle指数。我们显示,生成Whittle指数的神经网络也是对一组Markov决策问题产生最佳控制的神经网络。这一属性利用深入强化学习对NeurWIN进行训练的动力。我们通过评估NeurWINWIN对最近研究的3个无动于衷的土匪问题的表现,展示了NeurWIN的效用。我们的实验结果表明,NeurWIN的性能比其他RL算法要好得多。

0

相关内容

赌博机/老虎机

赌博机/老虎机

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

讲座报名 | 英国伯明翰大学计算机学院讲师来啦！

讲座报名 | 英国伯明翰大学计算机学院讲师来啦！

THU数据派

0+阅读 · 2021年9月26日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

基于多模态医学图像处理的多维可视化辅助诊疗关键技术研究

国家自然科学基金

3+阅读 · 2014年12月31日

ω3多不饱和脂肪酸代谢产物前列腺素E3(PGE3)和消退素(resolvins)抗前列腺癌机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于构效关系的神经降压素受体1拮抗剂优化

国家自然科学基金

0+阅读 · 2013年12月31日

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

形状记忆合金的多维本构模型及大尺寸组合管道连接件的设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

流动与传热计算中的高效稳健并行代数多重网格方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

大尺寸颗粒与各向同性湍流双向耦合的直接数值模拟

国家自然科学基金

0+阅读 · 2012年12月31日

非结构网格有限体积方法及其自适应研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

A Sound Up-to-$n$,$δ$ Bisimilarity for PCTL

Arxiv

0+阅读 · 2022年4月20日

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

A New Dynamic Algorithm for Densest Subhypergraphs

Arxiv

0+阅读 · 2022年4月17日

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Arxiv

0+阅读 · 2022年4月17日

Approaching sales forecasting using recurrent neural networks and transformers

Arxiv

0+阅读 · 2022年4月16日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Arxiv

0+阅读 · 2022年4月14日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

俄乌冲突的地缘政治与军事教训（万字长文）

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

讲座报名 | 英国伯明翰大学计算机学院讲师来啦！

讲座报名 | 英国伯明翰大学计算机学院讲师来啦！

THU数据派

0+阅读 · 2021年9月26日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A Sound Up-to-$n$,$δ$ Bisimilarity for PCTL

Arxiv

0+阅读 · 2022年4月20日

Reversible Gromov-Monge Sampler for Simulation-Based Inference

Arxiv

0+阅读 · 2022年4月18日

Risk and optimal policies in bandit experiments

Risk and optimal policies in bandit experiments

Arxiv

0+阅读 · 2022年4月18日

A New Dynamic Algorithm for Densest Subhypergraphs

Arxiv

0+阅读 · 2022年4月17日

A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Arxiv

0+阅读 · 2022年4月17日

Approaching sales forecasting using recurrent neural networks and transformers

Arxiv

0+阅读 · 2022年4月16日

Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks

Arxiv

0+阅读 · 2022年4月14日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

相关基金

应用数学暑期学校（2015）

国家自然科学基金

5+阅读 · 2015年7月12日

基于多模态医学图像处理的多维可视化辅助诊疗关键技术研究

国家自然科学基金

3+阅读 · 2014年12月31日

ω3多不饱和脂肪酸代谢产物前列腺素E3(PGE3)和消退素(resolvins)抗前列腺癌机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于构效关系的神经降压素受体1拮抗剂优化

国家自然科学基金

0+阅读 · 2013年12月31日

自适应移动Kriging插值响应面可靠性分析方法及其应用研究

国家自然科学基金

1+阅读 · 2013年12月31日

形状记忆合金的多维本构模型及大尺寸组合管道连接件的设计研究

国家自然科学基金

0+阅读 · 2013年12月31日

流动与传热计算中的高效稳健并行代数多重网格方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

大尺寸颗粒与各向同性湍流双向耦合的直接数值模拟

国家自然科学基金

0+阅读 · 2012年12月31日

非结构网格有限体积方法及其自适应研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员