用于非标准环境的适应性深 RL 方法, 带有小角度稳定背景 (An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context) - 专知论文

会员服务 ·

0

回合 · 分段 · 推断 · Markovian · 网格世界 ·

2022 年 12 月 24 日

An Adaptive Deep RL Method for Non-Stationary Environments with Piecewise Stable Context

翻译：用于非标准环境的适应性深 RL 方法, 带有小角度稳定背景

Xiaoyu Chen,Xiangming Zhu,Yufeng Zheng,Pushi Zhang,Li Zhao,Wenxue Cheng,Peng Cheng,Yongqiang Xiong,Tao Qin,Jianyu Chen,Tie-Yan Liu

from arxiv, NeurIPS 2022

One of the key challenges in deploying RL to real-world applications is to adapt to variations of unknown environment contexts, such as changing terrains in robotic tasks and fluctuated bandwidth in congestion control. Existing works on adaptation to unknown environment contexts either assume the contexts are the same for the whole episode or assume the context variables are Markovian. However, in many real-world applications, the environment context usually stays stable for a stochastic period and then changes in an abrupt and unpredictable manner within an episode, resulting in a segment structure, which existing works fail to address. To leverage the segment structure of piecewise stable context in real-world applications, in this paper, we propose a \textit{\textbf{Se}gmented \textbf{C}ontext \textbf{B}elief \textbf{A}ugmented \textbf{D}eep~(SeCBAD)} RL method. Our method can jointly infer the belief distribution over latent context with the posterior over segment length and perform more accurate belief context inference with observed data within the current context segment. The inferred belief context can be leveraged to augment the state, leading to a policy that can adapt to abrupt variations in context. We demonstrate empirically that SeCBAD can infer context segment length accurately and outperform existing methods on a toy grid world environment and Mujuco tasks with piecewise-stable context.

翻译：将 RL 部署到现实世界应用中的关键挑战之一是适应各种未知环境环境环境的变化,例如机器人任务中的地形变化和拥堵控制中的带宽波动。关于适应未知环境环境的现有工作要么假设整个事件的背景相同,要么假设上下文变量相同。然而,在许多现实世界应用中,环境环境环境通常在一个随机时期保持稳定,然后在一个插件中以突然和不可预测的方式发生变化,导致一个片段结构,而现有工作无法解决。为了在现实世界应用中利用片断稳定环境的区段结构,我们在本文件中提议对未知环境环境进行适应,要么假设整个事件的背景相同,要么假设环境变量相同。在许多真实世界应用中,环境环境环境环境环境环境环境环境环境通常保持稳定,我们的方法可以共同推导出在后段长度上方环境的视野分布,并用观察到的背景背景环境环境环境环境背景进行更精确的推导出,在当前的分区中,我们可以推导出环境环境环境环境环境变化。

0

相关内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

五脏温阳化瘀汤对PI3K/Akt-mTOR通路介导动脉粥样硬化型血管性痴呆自噬与凋亡的效应机制

国家自然科学基金

0+阅读 · 2014年12月31日

Versican 3'-非翻译区(3'-UTR)作为非编码竞争内源性RNA(ceRNA)通过调控MicroRNAs的功能在乳腺癌中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间微小碎片环境下航天器光学材料性能演化与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

GSK-3调控GAPDH嵌入线粒体的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

从内质网应激介导的CHOP凋亡途径探讨BPD发生机制

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

缺血脑损伤中TRPM7/ChaK1介导神经元Annexin 1膜转位及分泌在小胶质细胞活化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年2月23日

Adaptive Approximate Implicitization of Planar Parametric Curves via Weak Gradient Constraints

Arxiv

0+阅读 · 2023年2月23日

Loss Functions for Discrete Contextual Pricing with Observational Data

Arxiv

0+阅读 · 2023年2月22日

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects

Arxiv

0+阅读 · 2023年2月22日

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Arxiv

0+阅读 · 2023年2月22日

Integration of adaptive control and reinforcement learning for real-time control and learning

Arxiv

0+阅读 · 2023年2月22日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Active Prompting with Chain-of-Thought for Large Language Models

Arxiv

0+阅读 · 2023年2月23日

Adaptive Approximate Implicitization of Planar Parametric Curves via Weak Gradient Constraints

Arxiv

0+阅读 · 2023年2月23日

Loss Functions for Discrete Contextual Pricing with Observational Data

Arxiv

0+阅读 · 2023年2月22日

RoboNinja: Learning an Adaptive Cutting Policy for Multi-Material Objects

Arxiv

0+阅读 · 2023年2月22日

Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes

Arxiv

0+阅读 · 2023年2月22日

Integration of adaptive control and reinforcement learning for real-time control and learning

Arxiv

0+阅读 · 2023年2月22日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

相关基金

面向10Tb/in2级磁存储系统的二维LDPC码设计

国家自然科学基金

0+阅读 · 2015年12月31日

五脏温阳化瘀汤对PI3K/Akt-mTOR通路介导动脉粥样硬化型血管性痴呆自噬与凋亡的效应机制

国家自然科学基金

0+阅读 · 2014年12月31日

Versican 3'-非翻译区(3'-UTR)作为非编码竞争内源性RNA(ceRNA)通过调控MicroRNAs的功能在乳腺癌中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

MicroRNA调控BACE1在AD发病中的作用与机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

空间微小碎片环境下航天器光学材料性能演化与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

GSK-3调控GAPDH嵌入线粒体的作用和机制

国家自然科学基金

0+阅读 · 2012年12月31日

从内质网应激介导的CHOP凋亡途径探讨BPD发生机制

国家自然科学基金

0+阅读 · 2012年12月31日

关系的分解与Domain的表示

国家自然科学基金

1+阅读 · 2011年12月31日

缺血脑损伤中TRPM7/ChaK1介导神经元Annexin 1膜转位及分泌在小胶质细胞活化中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

CX3CL1/CX3CR1相互作用调控低氧前列腺癌细胞转移的分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员