与分布式蒙特卡洛树搜索相关的风险意识和多目标决策 (Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search) - 专知论文

会员服务 ·

0

蒙特卡洛树搜索 · 蒙特卡罗 · 总回报 · 提议分布 · 学成 ·

2021 年 2 月 1 日

Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

翻译：与分布式蒙特卡洛树搜索相关的风险意识和多目标决策

Conor F. Hayes,Mathieu Reymond,Diederik M. Roijers,Enda Howley,Patrick Mannion

from arxiv, 8 pages, 4 figures

In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinforcement learning as the value -- cannot account for the potential range of adverse or positive outcomes a decision may have. Our key insight is that we should use the distribution over expected future returns differently to represent the critical information that the agent requires at decision time. In this paper, we propose Distributional Monte Carlo Tree Search, an algorithm that learns a posterior distribution over the utility of the different possible returns attainable from individual policy executions, resulting in good policies for both risk-aware and multi-objective settings. Moreover, our algorithm outperforms the state-of-the-art in multi-objective reinforcement learning for the expected utility of the returns.

翻译：在许多风险意识和多目标强化学习环境中,用户的效用来自单项执行一项政策。在这些环境中,根据平均未来回报率作出决定是不合适的。例如,在医疗环境中,病人可能只有一次治疗其疾病的机会。在做决定时,在强化学习中被称为价值的预期回报率无法说明一个决定可能产生的不利或积极结果的范围。我们的关键见解是,我们应该使用对预期未来回报值的分布方式不同,以代表代理人在决策时间所要求的关键信息。我们在此文件中提议,Smitteal Monte Carlo树搜索是一种算法,该算法可以学习后方分配方式,而不是个别政策执行后可能实现的不同回报率的效用,从而形成良好的风险意识和多目标环境政策。此外,我们的算法超越了多目标强化学习中最新的多目标学习方法,以达到预期回报率的效用。

0

相关内容

蒙特卡洛树搜索

蒙特卡洛树搜索

多Agent深度强化学习综述(中文版)，21页pdf

专知会员服务

114+阅读 · 2020年12月31日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

数字病理学中的生成性对抗网络:趋势和未来潜力的综述 Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential

数字病理学中的生成性对抗网络:趋势和未来潜力的综述 Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential

专知会员服务

19+阅读 · 2020年5月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

5+阅读 · 2017年10月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Arxiv

0+阅读 · 2021年3月25日

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Arxiv

0+阅读 · 2021年3月25日

Active Tree Search in Large POMDPs

Active Tree Search in Large POMDPs

Arxiv

0+阅读 · 2021年3月25日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

A precise local limit theorem for the multinomial distribution and some applications

Arxiv

0+阅读 · 2021年3月24日

Markov Modeling of Time-Series Data using Symbolic Analysis

Arxiv

0+阅读 · 2021年3月23日

Binary disease prediction using tail quantiles of the distribution of continuous biomarkers

Arxiv

0+阅读 · 2021年3月23日

Bayesian Distributional Policy Gradients

Arxiv

0+阅读 · 2021年3月23日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

蒙特卡洛树搜索

相关VIP内容

多Agent深度强化学习综述(中文版)，21页pdf

专知会员服务

114+阅读 · 2020年12月31日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

数字病理学中的生成性对抗网络:趋势和未来潜力的综述 Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential

数字病理学中的生成性对抗网络:趋势和未来潜力的综述 Generative Adversarial Networks in Digital Pathology: A Survey on Trends and Future Potential

专知会员服务

19+阅读 · 2020年5月1日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

【微软Alekh等开放新书】强化学习理论与算法（Reinforcement Learning:Theory and Algorithms），附83页pdf

专知会员服务

121+阅读 · 2019年11月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

计算机 | IUI 2020等国际会议信息4条

计算机 | IUI 2020等国际会议信息4条

Call4Papers

6+阅读 · 2019年6月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

5+阅读 · 2017年10月20日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

Arxiv

0+阅读 · 2021年3月25日

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

Arxiv

0+阅读 · 2021年3月25日

Active Tree Search in Large POMDPs

Active Tree Search in Large POMDPs

Arxiv

0+阅读 · 2021年3月25日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

A precise local limit theorem for the multinomial distribution and some applications

Arxiv

0+阅读 · 2021年3月24日

Markov Modeling of Time-Series Data using Symbolic Analysis

Arxiv

0+阅读 · 2021年3月23日

Binary disease prediction using tail quantiles of the distribution of continuous biomarkers

Arxiv

0+阅读 · 2021年3月23日

Bayesian Distributional Policy Gradients

Arxiv

0+阅读 · 2021年3月23日

Risk-Aware Active Inverse Reinforcement Learning

Risk-Aware Active Inverse Reinforcement Learning

Arxiv

8+阅读 · 2019年1月8日

Adversarial Transfer Learning

Adversarial Transfer Learning

Arxiv

12+阅读 · 2018年12月6日

微信扫码咨询专知VIP会员