可计量的蒙特卡洛搜索错误 (Measurable Monte Carlo Search Error Bounds) - 专知论文

会员服务 ·

0

蒙特卡罗 · 赌博机/老虎机 · 蒙特卡罗估计 · 蒙特卡洛树搜索 · 估计/估计量 ·

2021 年 6 月 8 日

Measurable Monte Carlo Search Error Bounds

翻译：可计量的蒙特卡洛搜索错误

John Mern,Mykel J. Kochenderfer

from arxiv, 9 pages, submitted to NeurIPS 2021

Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.

翻译：蒙特卡洛规划者往往可以返回亚最佳行动,即使它们保证在无限样本的限度内汇合。已知的无症状的遗憾界限在搜索结束时无法提供任何方法来衡量对推荐行动的信心。在这项工作中,我们证明蒙特卡洛对非静态强盗和Markov决策程序的亚最佳估计值的界限。这些界限可以在搜索结束时直接计算,而不需要了解真正的行动价值。提交的界限被锁定给符合温和趋同条件的蒙特卡洛普通解决者。我们通过多臂强盗和离散的Markov决定程序的实验,对一个简单的解决者和蒙特卡洛树的搜索进行实验,对界限的紧密性进行了实验。

0

相关内容

蒙特卡罗

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

专知会员服务

213+阅读 · 2020年1月13日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Elastic Architecture Search for Diverse Tasks with Different Resources

Arxiv

0+阅读 · 2021年8月3日

Generalization bounds for nonparametric regression with $β-$mixing samples

Arxiv

0+阅读 · 2021年8月2日

Asymptotic bias of inexact Markov Chain Monte Carlo methods in high dimension

Arxiv

0+阅读 · 2021年8月2日

Hypocoercivity of Piecewise Deterministic Markov Process-Monte Carlo

Arxiv

0+阅读 · 2021年8月2日

Exact Pareto Optimal Search for Multi-Task Learning: Touring the Pareto Front

Arxiv

0+阅读 · 2021年8月2日

Towards Practical Mean Bounds for Small Samples

Arxiv

0+阅读 · 2021年8月1日

Bounds and Genericity of Sum-Rank-Metric Codes

Arxiv

0+阅读 · 2021年7月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

VIP会员

文章信息

相关主题

赌博机/老虎机

蒙特卡罗估计

蒙特卡洛树搜索

估计/估计量

相关VIP内容

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

【综述】超参数优化:算法和应用综述，Hyper-Parameter Optimization: A Review of Algorithms and Applications

专知会员服务

57+阅读 · 2020年3月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

【新书】深度学习搜索，Deep Learning for Search，附327页pdf

专知会员服务

213+阅读 · 2020年1月13日

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

【北京智源大会2019】增强人类智能：从搜索引擎到智能任务助理（ Augmenting Human Intelligence: From Search Engines to Intelligent Task Assistants ）

专知会员服务

20+阅读 · 2019年11月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

17种深度强化学习算法用Pytorch实现

17种深度强化学习算法用Pytorch实现

新智元

31+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

蒙特卡罗方法(Monte Carlo Methods)

蒙特卡罗方法(Monte Carlo Methods)

数据挖掘入门与实战

6+阅读 · 2018年4月22日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Elastic Architecture Search for Diverse Tasks with Different Resources

Arxiv

0+阅读 · 2021年8月3日

Generalization bounds for nonparametric regression with $β-$mixing samples

Arxiv

0+阅读 · 2021年8月2日

Asymptotic bias of inexact Markov Chain Monte Carlo methods in high dimension

Arxiv

0+阅读 · 2021年8月2日

Hypocoercivity of Piecewise Deterministic Markov Process-Monte Carlo

Arxiv

0+阅读 · 2021年8月2日

Exact Pareto Optimal Search for Multi-Task Learning: Touring the Pareto Front

Arxiv

0+阅读 · 2021年8月2日

Towards Practical Mean Bounds for Small Samples

Arxiv

0+阅读 · 2021年8月1日

Bounds and Genericity of Sum-Rank-Metric Codes

Arxiv

0+阅读 · 2021年7月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

微信扫码咨询专知VIP会员