多武装强盗的量子勘探算法 (Quantum exploration algorithms for multi-armed bandits) - 专知论文

会员服务 ·

0

赌博机/老虎机 · ARM · 可辨认的 · 置信度 · 最优化 ·

2020 年 12 月 15 日

Quantum exploration algorithms for multi-armed bandits

翻译：多武装强盗的量子勘探算法

Daochen Wang,Xuchen You,Tongyang Li,Andrew M. Childs

from arxiv, 18 pages, 1 figure. To appear in the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021)

Identifying the best arm of a multi-armed bandit is a central problem in bandit optimization. We study a quantum computational version of this problem with coherent oracle access to states encoding the reward probabilities of each arm as quantum amplitudes. Specifically, we show that we can find the best arm with fixed confidence using $\tilde{O}\bigl(\sqrt{\sum_{i=2}^n\Delta^{\smash{-2}}_i}\bigr)$ quantum queries, where $\Delta_{i}$ represents the difference between the mean reward of the best arm and the $i^\text{th}$-best arm. This algorithm, based on variable-time amplitude amplification and estimation, gives a quadratic speedup compared to the best possible classical result. We also prove a matching quantum lower bound (up to poly-logarithmic factors).

翻译：确定多臂强盗的最佳臂膀是强盗优化的一个中心问题。我们研究的是这个问题的量子计算版本, 以一致的 oracle 访问状态将每个臂的奖励概率编码为量子振幅。具体地说, 我们用 $tilde{ O ⁇ bigl (\\\\ qrt\ sum ⁇ i=2\\\\\ Delta ⁇ smash{-2 ⁇ ⁇ i ⁇ i ⁇ i} 量子查询方法来显示我们能找到最好的臂膀。 $delta ⁇ i} 是最佳臂的平均值和 $i{ text{th}- best arm 之间的差额。这个算法基于可变的振幅振动和估计, 与可能的最佳传统结果相比, 给出了四方形加速。我们还证明了一个匹配的量较低约束( 至于多logriticric) 。

0

相关内容

赌博机/老虎机

赌博机/老虎机

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【WWW2020】解决推荐系统中目标客户失真问题，Addressing the Target Customer Distortion Problem in Recommender Systems

【WWW2020】解决推荐系统中目标客户失真问题，Addressing the Target Customer Distortion Problem in Recommender Systems

专知会员服务

10+阅读 · 2020年4月4日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年2月19日

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Arxiv

0+阅读 · 2021年2月18日

Designing Approximately Optimal Search on Matching Platforms

Arxiv

0+阅读 · 2021年2月17日

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月17日

Quantum complexity of minimum cut

Arxiv

0+阅读 · 2021年2月17日

Quantum information theory and Fourier multipliers on quantum groups

Arxiv

0+阅读 · 2021年2月17日

Efficient and near-optimal algorithms for sampling connected subgraphs

Arxiv

0+阅读 · 2021年2月17日

Recurrent Submodular Welfare and Matroid Blocking Bandits

Arxiv

0+阅读 · 2021年2月16日

Multi-Agent Multi-Armed Bandits with Limited Communication

Arxiv

0+阅读 · 2021年2月10日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

【AAAI2021】自校正Q学习，Self-correcting Q-Learning

专知会员服务

17+阅读 · 2020年12月4日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【WWW2020】解决推荐系统中目标客户失真问题，Addressing the Target Customer Distortion Problem in Recommender Systems

【WWW2020】解决推荐系统中目标客户失真问题，Addressing the Target Customer Distortion Problem in Recommender Systems

专知会员服务

10+阅读 · 2020年4月4日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec 每周算法：parameter-free contextual bandits (SIGIR'15)

LibRec智能推荐

5+阅读 · 2017年6月12日

相关论文

Randomized Exploration is Near-Optimal for Tabular MDP

Arxiv

0+阅读 · 2021年2月19日

Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Arxiv

0+阅读 · 2021年2月18日

Designing Approximately Optimal Search on Matching Platforms

Arxiv

0+阅读 · 2021年2月17日

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2021年2月17日

Quantum complexity of minimum cut

Arxiv

0+阅读 · 2021年2月17日

Quantum information theory and Fourier multipliers on quantum groups

Arxiv

0+阅读 · 2021年2月17日

Efficient and near-optimal algorithms for sampling connected subgraphs

Arxiv

0+阅读 · 2021年2月17日

Recurrent Submodular Welfare and Matroid Blocking Bandits

Arxiv

0+阅读 · 2021年2月16日

Multi-Agent Multi-Armed Bandits with Limited Communication

Arxiv

0+阅读 · 2021年2月10日

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

Arxiv

4+阅读 · 2020年2月14日

微信扫码咨询专知VIP会员