STSTS:基于短期的波动控制政策搜索及其全球趋同 (STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence) - 专知论文

会员服务 ·

0

策略搜索 · 全局优化 · 优化器 · Learning · state-of-the-art ·

2022 年 7 月 22 日

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

翻译：STSTS:基于短期的波动控制政策搜索及其全球趋同

Liangliang Xu,Daoming Lyu,Yangchen Pan,Aiwen Jiang,Bo Liu

It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of STOPS' performance among existing risk-averse policy search methods.

翻译：对现实世界应用采用现有的反风险方法仍然具有挑战性。其原因有多重,包括缺乏全球最佳性保障和从长期连续的轨迹中学习的必要性。长期连续的轨迹容易涉及访问危险国家,这是风险规避环境中的一个主要问题。本文件提出了短期机动性控制政策搜索(STOPS),这是一种新颖的算法,它通过学习短期轨迹而不是长期轨迹来解决反风险问题。短期轨迹更灵活地生成,并能够避免危险国家访问的危险。通过使用一个具有超分度的两层神经网络的演员-轨迹计划,我们的算法发现全球最佳政策处于亚线速率,政策优化和自然政策梯度,其效力可与风险中性政策研究方法的最新趋同率相比。短期轨迹更灵活,可以产生短期轨迹,并可以避免危险国家访问的危险。通过使用一个具有超分度的两层神经网络的演员-极轨迹图,我们的算法发现一种全球最佳的亚线速政策政策,其效力可与风险-中性政策研究方法的最新趋同率一致率率。在中,在中以中具有挑战性的穆乔科机器人模拟模拟模拟模拟模拟模拟任务中,在平均的搜索风险评估度度度度上展示式政策风险评估中展示。

0

相关内容

策略搜索

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

白菜eIF(iso)4E基因抗TuMV机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

迪茨氏属菌的石油降解基因及调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

维甲酸上调肺癌细胞中miRNA let7a表达的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

高亮度、窄线宽激光二极管阵列

国家自然科学基金

0+阅读 · 2009年12月31日

Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Arxiv

0+阅读 · 2022年9月19日

One Network, Many Robot: Generative Graphical Inverse Kinematics

Arxiv

0+阅读 · 2022年9月19日

Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks

Arxiv

0+阅读 · 2022年9月19日

DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs

Arxiv

0+阅读 · 2022年9月18日

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Arxiv

0+阅读 · 2022年9月17日

Towards Optimal Use of Surrogate Markers to Improve Power

Arxiv

0+阅读 · 2022年9月17日

Base rate neglect in computer science education

Arxiv

0+阅读 · 2022年9月17日

LEARNEST: LEARNing Enhanced Model-based State ESTimation for Robots using Knowledge-based Neural Ordinary Differential Equations

Arxiv

0+阅读 · 2022年9月16日

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

Arxiv

0+阅读 · 2022年9月15日

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Arxiv

0+阅读 · 2022年9月14日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【如何做研究】How to research ，22页ppt

【如何做研究】How to research ，22页ppt

专知会员服务

112+阅读 · 2021年4月17日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios

Arxiv

0+阅读 · 2022年9月19日

One Network, Many Robot: Generative Graphical Inverse Kinematics

Arxiv

0+阅读 · 2022年9月19日

Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks

Arxiv

0+阅读 · 2022年9月19日

DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs

Arxiv

0+阅读 · 2022年9月18日

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Arxiv

0+阅读 · 2022年9月17日

Towards Optimal Use of Surrogate Markers to Improve Power

Arxiv

0+阅读 · 2022年9月17日

Base rate neglect in computer science education

Arxiv

0+阅读 · 2022年9月17日

LEARNEST: LEARNing Enhanced Model-based State ESTimation for Robots using Knowledge-based Neural Ordinary Differential Equations

Arxiv

0+阅读 · 2022年9月16日

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

Arxiv

0+阅读 · 2022年9月15日

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

Arxiv

0+阅读 · 2022年9月14日

相关基金

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

蓖麻矮化相关RcDof基因功能分析及调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂肪细胞因子家族基因多态性与动脉粥样硬化性脑梗死的相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

白菜eIF(iso)4E基因抗TuMV机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

迪茨氏属菌的石油降解基因及调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ARK5/p38MAPK/Pim-3信号通路在胃癌发生、发展中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

白桦FT及SOC1基因的RNAi研究

国家自然科学基金

0+阅读 · 2009年12月31日

维甲酸上调肺癌细胞中miRNA let7a表达的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

高亮度、窄线宽激光二极管阵列

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员