基于已知价值差异的节点分组：一种无损的UCT抽象算法 (Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm)

A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We call this abstraction framework Known Value Difference Abstractions (KVDA), which infers the value differences by analysis of the immediate rewards and modifies OGA-UCT to use this framework instead. The modification is called KVDA-UCT, which detects significantly more abstractions than OGA-UCT, introduces no additional parameter, and outperforms OGA-UCT on a variety of deterministic environments and parameter settings.

翻译：蒙特卡洛树搜索（MCTS）的核心挑战在于其采样效率，该效率可通过将状态-动作对分组并使用其聚合统计量而非单节点统计量来提升。在确定性环境中，应用于树的Go抽象上置信界（OGA-UCT）是最先进的MCTS抽象算法，它利用状态-动作对抽象（ASAP）框架构建抽象，该框架旨在通过分析搜索图来检测在最优策略下具有相同价值的状态和状态-动作对。然而，ASAP要求两个状态-动作对具有相同的即时奖励，这是一个严格的限制条件，减少了可发现的抽象数量，从而制约了采样效率。本文打破了分组价值等价状态或状态-动作对的范式，转而分组那些价值可能不同但价值差异可推断的状态和状态-动作对。我们将此抽象框架称为已知价值差异抽象（KVDA），它通过分析即时奖励来推断价值差异，并修改OGA-UCT以使用该框架。修改后的算法称为KVDA-UCT，它比OGA-UCT检测到显著更多的抽象，未引入额外参数，并在多种确定性环境和参数设置下优于OGA-UCT。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日