ExPOSe:以国家为基础的探索与逐步在线搜索相结合 (ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search) - 专知论文

会员服务 ·

0

INFORMS · 在线 · 相似度 · Performer · SimPLe ·

2023 年 1 月 4 日

ExPoSe: Combining State-Based Exploration with Gradient-Based Online Search

翻译：ExPOSe:以国家为基础的探索与逐步在线搜索相结合

Dixant Mittal,Siddharth Aravindan,Wee Sun Lee

A tree-based online search algorithm iteratively simulates trajectories and updates action-values of a set of states stored in a tree structure. It works reasonably well in practice but fails to take advantage of the information gathered from similar states. Depending upon the smoothness of the action-value function, a simple way to interpolate information among similar states is to perform online learning; policy gradient search provides a practical algorithm to achieve this. However, policy gradient search does not have an explicit exploration mechanism, which is present in tree-based online search algorithms. In this paper, we propose an efficient and effective online search algorithm, named Exploratory Policy Gradient Search (ExPoSe), that leverages information sharing among states by directly updating the search policy parameters while following a well-defined exploration mechanism during the online search. We conduct experiments on several decision-making problems, including Atari games, Sokoban and Hamiltonian cycle search in sparse graphs and show that ExPoSe consistently outperforms popular online search algorithms across all domains.

翻译：以树为基础的在线搜索算法迭接模拟了树结构中储存的一组状态的轨迹并更新了它们的行动价值。它在实践上运作得相当好,但未能利用从类似国家收集的信息。根据行动价值功能的顺利性,在类似国家中进行信息互调的简单方法就是进行在线学习;政策梯度搜索提供了实现这一点的实用算法。然而,政策梯度搜索没有明确的探索机制,而这种机制存在于以树为基础的在线搜索算法中。在本文中,我们提出一个高效有效的在线搜索算法,名为探索政策梯度搜索(ExPOSe),它通过直接更新搜索政策参数,同时在网上搜索过程中遵循一个明确界定的探索机制来利用各国之间的信息共享。我们就一些决策问题进行了实验,包括Atari游戏、Sokoban和汉密尔顿循环搜索,在稀有的图表中进行,并显示ExPOSe一贯超越了所有领域受欢迎的在线搜索算法。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

三维非线性磁流体力学的自适应有限元方法

国家自然科学基金

0+阅读 · 2014年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

RICCI流的整体解和收敛性

国家自然科学基金

0+阅读 · 2012年12月31日

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Human-AI Shared Control via Policy Dissection

Arxiv

0+阅读 · 2023年3月2日

Training Efficient Controllers via Analytic Policy Gradient

Arxiv

0+阅读 · 2023年3月2日

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Co-learning Planning and Control Policies Using Differentiable Formal Task Constraints

Arxiv

0+阅读 · 2023年3月2日

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Arxiv

0+阅读 · 2023年3月2日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年2月28日

Targeted Search Control in AlphaZero for Effective Policy Improvement

Arxiv

0+阅读 · 2023年2月28日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Toward Risk-based Optimistic Exploration for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月3日

Human-AI Shared Control via Policy Dissection

Arxiv

0+阅读 · 2023年3月2日

Training Efficient Controllers via Analytic Policy Gradient

Arxiv

0+阅读 · 2023年3月2日

Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年3月2日

Co-learning Planning and Control Policies Using Differentiable Formal Task Constraints

Arxiv

0+阅读 · 2023年3月2日

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Arxiv

0+阅读 · 2023年3月2日

AR3n: A Reinforcement Learning-based Assist-As-Needed Controller for Robotic Rehabilitation

Arxiv

0+阅读 · 2023年2月28日

Targeted Search Control in AlphaZero for Effective Policy Improvement

Arxiv

0+阅读 · 2023年2月28日

Reinforcement Learning based Air Combat Maneuver Generation

Reinforcement Learning based Air Combat Maneuver Generation

Arxiv

91+阅读 · 2022年1月14日

MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

Arxiv

12+阅读 · 2021年2月7日

相关基金

两类带导数的非线性Schrodinger方程拟周期解的存在性

国家自然科学基金

0+阅读 · 2015年12月31日

动力学涨落对网络结构的影响

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

三维非线性磁流体力学的自适应有限元方法

国家自然科学基金

0+阅读 · 2014年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

β-Sarcoglycan在mSOD1介导ALS骨骼肌病变中的机制研究

国家自然科学基金

1+阅读 · 2012年12月31日

RICCI流的整体解和收敛性

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员