用于风险软件和多目标强化学习的蒙特卡洛树搜索算法 (Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning) - 专知论文

会员服务 ·

0

蒙特卡洛树搜索 · 蒙特卡罗 · 总回报 · Learning · 强化学习 ·

2022 年 12 月 6 日

Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning

翻译：用于风险软件和多目标强化学习的蒙特卡洛树搜索算法

Conor F. Hayes,Mathieu Reymond,Diederik M. Roijers,Enda Howley,Patrick Mannion

from arxiv, arXiv admin note: substantial text overlap with arXiv:2102.00966

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Therefore, we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time by taking both the future and accrued returns into consideration. In this paper, we propose two novel Monte Carlo tree search algorithms. Firstly, we present a Monte Carlo tree search algorithm that can compute policies for nonlinear utility functions (NLU-MCTS) by optimising the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Secondly, we propose a distributional Monte Carlo tree search algorithm (DMCTS) which extends NLU-MCTS. DMCTS computes an approximate posterior distribution over the utility of the returns, and utilises Thompson sampling during planning to compute policies in risk-aware and multi-objective settings. Both algorithms outperform the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.

翻译：在许多风险意识和多目标强化学习环境中,用户的效用来自单一执行一项政策。在这些环境中,根据平均未来回报率作出决定是不合适的。例如,在医疗环境中,病人可能只有一次治疗其疾病的机会。仅仅使用预期的未来回报率 -- -- 在强化学习中被称为价值 -- -- 无法说明一个决定可能产生的不利或积极结果的范围。因此,我们应该使用对预期未来回报率的分布方式,不同地代表代理人在决策时间需要的关键信息,方法是既考虑未来又考虑累积回报。在本文中,我们提出两个新颖的蒙特卡洛树搜索算法。首先,我们提出蒙特卡洛树搜索算法,通过优化个别政策执行后可能实现的不同回报率的效用来计算非线性公用事业功能(NLU-MCTS),从而产生风险意识风险和多目标环境的良好政策。第二,我们提议采用分配的蒙特卡洛树搜索算法(DMCTS),将NLU-MCTS的效用值考虑在内。DMCTS在服务器上,将一个接近性目标矩阵分析模型的回报率,在对目标矩阵中,对目标矩阵的回收进行。

0

相关内容

蒙特卡洛树搜索

蒙特卡洛树搜索

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

含边界层与界面层的输运方程数值算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Livin-Fibronectin分子与生物力学信号偶联介导前列腺癌“抵抗-逃离”转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MTA2基因调控胃癌侵袭及转移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载InSAR区域网平差方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

单相抑郁症与5-HTR信号转导通路基因-环境作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2023年2月7日

Transfer learning for process design with reinforcement learning

Arxiv

0+阅读 · 2023年2月7日

Models and algorithms for simple disjunctive temporal problems

Arxiv

0+阅读 · 2023年2月6日

Refined Value-Based Offline RL under Realizability and Partial Coverage

Arxiv

0+阅读 · 2023年2月5日

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月3日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

蒙特卡洛树搜索

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【2022新书】强化学习工业应用，408页pdf

【2022新书】强化学习工业应用，408页pdf

专知会员服务

231+阅读 · 2022年2月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型智能体强化学习：全景综述

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Performative Reinforcement Learning

Arxiv

0+阅读 · 2023年2月7日

Transfer learning for process design with reinforcement learning

Arxiv

0+阅读 · 2023年2月7日

Models and algorithms for simple disjunctive temporal problems

Arxiv

0+阅读 · 2023年2月6日

Refined Value-Based Offline RL under Realizability and Partial Coverage

Arxiv

0+阅读 · 2023年2月5日

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

Arxiv

0+阅读 · 2023年2月3日

Deep Reinforcement Learning for Multi-Agent Interaction

Arxiv

46+阅读 · 2022年8月2日

Transformers are Meta-Reinforcement Learners

Arxiv

15+阅读 · 2022年6月14日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

间接优化的高效Monte Carlo声传播研究

国家自然科学基金

0+阅读 · 2017年12月31日

含边界层与界面层的输运方程数值算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

Livin-Fibronectin分子与生物力学信号偶联介导前列腺癌“抵抗-逃离”转移机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MTA2基因调控胃癌侵袭及转移的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

机载InSAR区域网平差方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

单相抑郁症与5-HTR信号转导通路基因-环境作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Lorenz-like系统族的等价性和混沌吸引子几何结构

国家自然科学基金

0+阅读 · 2011年12月31日

刚柔嵌段共聚物自组装行为的快速非格子Monte Carlo模拟研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员