Maillard抽样:博尔茨曼勘探 (Maillard Sampling: Boltzmann Exploration Done Optimally) - 专知论文

会员服务 ·

0

赌博机/老虎机 · MS · 优化器 · Performer · Less ·

2021 年 11 月 5 日

Maillard Sampling: Boltzmann Exploration Done Optimally

翻译：Maillard抽样:博尔茨曼勘探

Jie Bian,Kwang-Sung Jun

The PhD thesis of Maillard (2013) presents a randomized algorithm for the $K$-armed bandit problem. This less-known algorithm, which we call Maillard sampling (MS), computes the probability of choosing each arm in a closed form, which is useful for counterfactual evaluation from bandit-logged data but was lacking from Thompson sampling, a widely-adopted bandit algorithm in the industry. Motivated by such merit, we revisit MS and perform an improved analysis to show that it achieves both the asymptotical optimality and $\sqrt{KT\log{T}}$ minimax regret bound where $T$ is the time horizon, which matches the standard asymptotically optimal UCB's performance. We then propose a variant of MS called MS$^+$ that improves its minimax bound to $\sqrt{KT\log{K}}$ without losing the asymptotic optimality. MS$^+$ can also be tuned to be aggressive (i.e., less exploration) without losing theoretical guarantees, a unique feature unavailable from existing bandit algorithms. Our numerical evaluation shows the effectiveness of MS$^+$.

翻译：Maillard的博士论文(2013年) Maillard 的博士论文为武装匪徒问题提供了一种随机的算法。这种不为人知的算法,我们称之为Maillard 抽样(MS),我们称之为Maillard 抽样(MS),用来计算以封闭形式选择每个手臂的概率,这对从土匪调查数据中反事实评估有用,但却缺乏Thompson 抽样(Thompson),这是行业中广泛采用的一种强盗算法。我们受此优点的驱动,重新审视MS,并进行更好的分析,以显示它既能达到无症状的最佳性,又能达到$@sqrt{KT\log{T\ ⁇ $迷你马克斯的负负负负负负负负负负负负,而不会失去理论保证(即较少探索),这与UCB的实绩相符。我们随后提出了一种称为MS$$(M)的变式模式,它能显示我们的数字评估。

0

相关内容

赌博机/老虎机

赌博机/老虎机

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

93+阅读 · 2020年5月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

已删除

将门创投

9+阅读 · 2019年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Selectable Heaps and Optimal Lazy Search Trees

Arxiv

0+阅读 · 2022年1月10日

Well-Conditioned Linear Minimum Mean Square Error Estimation

Arxiv

0+阅读 · 2022年1月6日

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Arxiv

0+阅读 · 2022年1月6日

High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

Arxiv

0+阅读 · 2022年1月5日

Dynamic Suffix Array with Polylogarithmic Queries and Updates

Arxiv

0+阅读 · 2022年1月4日

Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Arxiv

0+阅读 · 2022年1月4日

A Statistical Approach to Estimating Adsorption-Isotherm Parameters in Gradient-Elution Preparative Liquid Chromatography

Arxiv

0+阅读 · 2022年1月4日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

赌博机/老虎机

相关VIP内容

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

来自Fariz Darari博士的一份简明《神经网络与深度学习》的讲义，64页ppt

专知会员服务

93+阅读 · 2020年5月5日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

已删除

将门创投

9+阅读 · 2019年11月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

相关论文

Selectable Heaps and Optimal Lazy Search Trees

Arxiv

0+阅读 · 2022年1月10日

Well-Conditioned Linear Minimum Mean Square Error Estimation

Arxiv

0+阅读 · 2022年1月6日

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Arxiv

0+阅读 · 2022年1月6日

High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

Arxiv

0+阅读 · 2022年1月5日

Dynamic Suffix Array with Polylogarithmic Queries and Updates

Arxiv

0+阅读 · 2022年1月4日

Accelerated Zeroth-Order and First-Order Momentum Methods from Mini to Minimax Optimization

Arxiv

0+阅读 · 2022年1月4日

A Statistical Approach to Estimating Adsorption-Isotherm Parameters in Gradient-Elution Preparative Liquid Chromatography

Arxiv

0+阅读 · 2022年1月4日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Implicit Maximum Likelihood Estimation

Implicit Maximum Likelihood Estimation

Arxiv

7+阅读 · 2018年9月24日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员