在Carcassonne游戏中以进化算法方式对树木进行高度信任界值的演变 (On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means of Evolutionary Algorithms in the Game of Carcassonne) - 专知论文

会员服务 ·

0

上置信界限 · 置信度 · 控制器 · 蒙特卡洛树搜索 · 蒙特卡罗 ·

2021 年 12 月 17 日

On the Evolution of the MCTS Upper Confidence Bounds for Trees by Means of Evolutionary Algorithms in the Game of Carcassonne

翻译：在Carcassonne游戏中以进化算法方式对树木进行高度信任界值的演变

Edgar Galván,Gavin Simpson

from arxiv, 9 pages, 1 figure, 11 tables

Monte Carlo Tree Search (MCTS) is a sampling best-first method to search for optimal decisions. The MCTS's popularity is based on its extraordinary results in the challenging two-player based game Go, a game considered much harder than Chess and that until very recently was considered infeasible for Artificial Intelligence methods. The success of MCTS depends heavily on how the tree is built and the selection process plays a fundamental role in this. One particular selection mechanism that has proved to be reliable is based on the Upper Confidence Bounds for Trees, commonly referred as UCT. The UCT attempts to nicely balance exploration and exploitation by considering the values stored in the statistical tree of the MCTS. However, some tuning of the MCTS UCT is necessary for this to work well. In this work, we use Evolutionary Algorithms (EAs) to evolve mathematical expressions with the goal to substitute the UCT mathematical expression. We compare our proposed approach, called Evolution Strategy in MCTS (ES-MCTS) against five variants of the MCTS UCT, three variants of the star-minimax family of algorithms as well as a random controller in the Game of Carcassonne. We also use a variant of our proposed EA-based controller, dubbed ES partially integrated in MCTS. We show how the ES-MCTS controller, is able to outperform all these 10 intelligent controllers, including robust MCTS UCT controllers.

翻译：蒙特卡洛树搜索(MCTS)是寻找最佳决定的最佳第一方法。 MCTS的受欢迎程度基于其具有挑战性的双球游戏Go的非凡结果。Go是一个比象棋更难看的游戏,直到最近才被认为对人工智能方法不可行。MCTS的成功在很大程度上取决于树的构造和选择过程在这方面发挥根本作用。一个被证明可靠的特定选择机制是基于树的高度信任圈,通常被称为UCT。UCT试图通过考虑MCTS统计树中储存的数值来平衡探索和开发。然而,对MCTS UCT进行某些调整对于这项工作的顺利运作是必要的。在这项工作中,我们使用进化的Agorths(EAs)来演化数学表达方式,以取代UCT数学表达方式。我们比较了我们所提议的方法,即所谓的MCTS的高级智能战略(ES-MCTS)与MCT的五种变体。UCT试图平衡MCT的三种变体,而MCT的3种变体,这三种变体,包括ES-CCT(ES-COL)的变体,也是我们A-CLICLA的变体,我们在10级的变体中,我们提出的S-CLICLIA中,我们提议的A的变体的变体,也展示了我们提议的系统的变体。

0

相关内容

上置信界限

上置信界限

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

专知会员服务

230+阅读 · 2020年4月21日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

专知会员服务

34+阅读 · 2019年12月27日

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

专知会员服务

79+阅读 · 2019年11月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【综述】多智能体强化学习算法理论研究

【综述】多智能体强化学习算法理论研究

深度强化学习实验室

15+阅读 · 2020年9月9日

强化学习 DQN 初探之2048

强化学习 DQN 初探之2048

DataFunTalk

7+阅读 · 2019年12月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

The Segment Number: Algorithms and Universal Lower Bounds for Some Classes of Planar Graphs

Arxiv

0+阅读 · 2022年2月23日

On the Rate of Convergence of Payoff-based Algorithms to Nash Equilibrium in Strongly Monotone Games

Arxiv

0+阅读 · 2022年2月22日

Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

Arxiv

0+阅读 · 2022年2月22日

Simplest Streaming Trees

Arxiv

0+阅读 · 2022年2月21日

On Variance Estimation of Random Forests

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年2月18日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

上置信界限

蒙特卡洛树搜索

相关VIP内容

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

【经典书】现代统计方法基础，267页pdf，Fundamentals of Modern Statistical Methods

专知会员服务

64+阅读 · 2020年8月10日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

专知会员服务

230+阅读 · 2020年4月21日

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

【硬核书】数学博弈论与应用，431页pdf，Mathematical Game Theory and Applications

专知会员服务

170+阅读 · 2020年4月18日

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

【机器学习与深度学习基础性算法】Foundational ML and DL Algorithms

专知会员服务

34+阅读 · 2019年12月27日

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

【新书稿：强化学习：理论与算法】《Reinforcement Learning: Theory and Algorithms》by Alekh Agarwal, Nan Jiang, Sham M. Kakade (2019)，(附83页pdf)

专知会员服务

79+阅读 · 2019年11月23日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

【综述】多智能体强化学习算法理论研究

【综述】多智能体强化学习算法理论研究

深度强化学习实验室

15+阅读 · 2020年9月9日

强化学习 DQN 初探之2048

强化学习 DQN 初探之2048

DataFunTalk

7+阅读 · 2019年12月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

Github项目推荐 | 最优控制、强化学习和运动规划等主题参考文献集锦

AI研习社

3+阅读 · 2019年4月21日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

The Segment Number: Algorithms and Universal Lower Bounds for Some Classes of Planar Graphs

Arxiv

0+阅读 · 2022年2月23日

On the Rate of Convergence of Payoff-based Algorithms to Nash Equilibrium in Strongly Monotone Games

Arxiv

0+阅读 · 2022年2月22日

Practical Schemes for Finding Near-Stationary Points of Convex Finite-Sums

Arxiv

0+阅读 · 2022年2月22日

Simplest Streaming Trees

Arxiv

0+阅读 · 2022年2月21日

On Variance Estimation of Random Forests

On Variance Estimation of Random Forests

Arxiv

0+阅读 · 2022年2月18日

Information-theoretic generalization bounds for black-box learning algorithms

Arxiv

12+阅读 · 2021年10月4日

Settling the Variance of Multi-Agent Policy Gradients

Arxiv

8+阅读 · 2021年8月20日

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Arxiv

15+阅读 · 2020年12月15日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

微信扫码咨询专知VIP会员