非标准非政策性非政策性优化 (Non-Stationary Off-Policy Optimization) - 专知论文

会员服务 ·

0

优化器 · 学成 · Bandits · state-of-the-art · Performance ·

2021 年 4 月 4 日

Non-Stationary Off-Policy Optimization

翻译：非标准非政策性非政策性优化

Joey Hong,Branislav Kveton,Manzil Zaheer,Yinlam Chow,Amr Ahmed

from arxiv, AISTATS 2021; 16 pages, 2 figures

Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution has two phases. In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state. In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance. This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment. To show the effectiveness of our approach, we compare it to state-of-the-art baselines on both synthetic and real-world datasets. Our approach outperforms methods that act only on observed context.

翻译：离政策学习是评估和优化政策而不部署政策的框架,它来自另一项政策收集的数据。现实世界环境通常是非静止的,离线学习的政策应该适应这些变化。为了应对这一挑战,我们研究了非政策优化的新问题。为了应对这一挑战,我们研究了在零星静止背景土匪中脱离政策优化的新问题。我们提出的解决方案分为两个阶段。在离线学习阶段,我们将记录的数据分解成绝对隐蔽的状态,并为每个州学习近乎最佳的次级政策。在在线部署阶段,我们根据它们的表现适应了在所学的次政策之间的转换。这个方法既实用又可分析,我们为离政策优化的质量提供保障,也为在线部署过程中的遗憾提供了保障。为了展示我们的方法的有效性,我们将其与合成和真实世界数据集的最新基线进行比较。我们的方法超越了仅根据观察到的背景行事的方法。

0

相关内容

优化器

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Arxiv

0+阅读 · 2021年5月28日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

Arxiv

1+阅读 · 2021年5月27日

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Arxiv

0+阅读 · 2021年5月26日

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

Arxiv

0+阅读 · 2021年5月26日

Deep Reinforcement Learning Methods for Structure-Guided Processing Path Optimization

Arxiv

0+阅读 · 2021年5月26日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【干货书】鲁棒优化Robust Optimization，570页pdf

专知会员服务

144+阅读 · 2021年3月17日

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

如何撰写好你的博士论文？CMU-Priya博士这30页ppt为你指点

专知会员服务

58+阅读 · 2020年10月30日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

【毕业之路】如何修改博士论文？这份45页PPT《Editing your thesis》教你

专知会员服务

78+阅读 · 2020年4月13日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Arxiv

0+阅读 · 2021年5月28日

Average-Reward Off-Policy Policy Evaluation with Function Approximation

Arxiv

0+阅读 · 2021年5月27日

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

Arxiv

1+阅读 · 2021年5月27日

Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning

Arxiv

0+阅读 · 2021年5月26日

Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation

Arxiv

0+阅读 · 2021年5月26日

Deep Reinforcement Learning Methods for Structure-Guided Processing Path Optimization

Arxiv

0+阅读 · 2021年5月26日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

Arxiv

4+阅读 · 2018年3月22日

微信扫码咨询专知VIP会员