强化学习:面对不确定性和持续遗憾的悲观主义 (Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret) - 专知论文

会员服务 ·

0

情景 · 分离的 · Learning · 在线 · Minimax ·

2022 年 9 月 24 日

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

翻译：强化学习:面对不确定性和持续遗憾的悲观主义

Jiawei Huang,Li Zhao,Tao Qin,Wei Chen,Nan Jiang,Tie-Yan Liu

from arxiv, 38 pages; NeurIPS 2022

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs.~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.

翻译：我们建议一个新的学习框架, 捕捉许多真实世界用户互动应用程序的分层结构。用户可以基于对勘探风险的不同容忍度, 分为两个组, 并且应该分别对待。在此背景下, 我们同时维持两个政策 $\ p ⁇ text{O} $ 和$\ p ⁇ text{E}} 美元 : $\\ pí text{O} (O) 与第一个层次的更多风险容忍用户互动, 并按常规平衡勘探和开发, 最大限度地减少遗憾, 而 $\\\ text{ E} (用于“ 开发” 的“ E” ) 专门侧重于为第二层次的风险偏向用户开发。在此背景下, 我们同时保留两种政策, 这样的分离是否在标准在线设置上产生优势 : $\ pí text{ { { { { } { { { { { { } (O} 美元。我们单独考虑最不依赖美元的数值和美元直观环境。。。。。 $} 。对于前者, 我们证明这种分离确实不有利于微美元的的。 K_ 的。

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

基于印迹基因Dlk1探讨针刺阳明经穴防治痿病肌萎缩的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

BNIP3在脊髓损伤后神经元线粒体自噬中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pax6基因与弹尾纲眼区的发育和进化关系

国家自然科学基金

0+阅读 · 2012年12月31日

RyR钙释放通道-调节蛋白复合体的结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

三江中段夏塞银铅锌矿床微量元素富集机制及对成矿过程的指示：硫化物微区分析

国家自然科学基金

0+阅读 · 2012年12月31日

砷暴露人群DNA甲基化与地砷病及尿砷代谢模式的关系

国家自然科学基金

0+阅读 · 2012年12月31日

核苷酸切除修复通路基因tSNPs筛选及其与高发区食管癌易感性

国家自然科学基金

0+阅读 · 2010年12月31日

PGRMC1蛋白在肾癌中的功能及作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用导向合金改变Mg/Al扩散通道的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Private optimization in the interpolation regime: faster rates and hardness results

Arxiv

0+阅读 · 2022年10月31日

Multi-UAV trajectory planning for 3D visual inspection of complex structures

Arxiv

0+阅读 · 2022年10月30日

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Arxiv

1+阅读 · 2022年10月29日

DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving

Arxiv

0+阅读 · 2022年10月29日

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月28日

Dynamic Bandits with an Auto-Regressive Temporal Structure

Arxiv

0+阅读 · 2022年10月28日

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Arxiv

0+阅读 · 2022年10月28日

Uncertainty Estimation Using Riemannian Model~Dynamics for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月28日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《代码、指挥与冲突：描绘军事人工智能的未来》报告

【斯坦福博士论文】面向地理空间数据的多模态与多尺度建模：时空生成式人工智能

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《创新与适应性作为军事成功的关键因素：来自俄乌战争的战略洞见》报告

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Private optimization in the interpolation regime: faster rates and hardness results

Arxiv

0+阅读 · 2022年10月31日

Multi-UAV trajectory planning for 3D visual inspection of complex structures

Arxiv

0+阅读 · 2022年10月30日

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Arxiv

1+阅读 · 2022年10月29日

DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving

Arxiv

0+阅读 · 2022年10月29日

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Arxiv

0+阅读 · 2022年10月28日

Dynamic Bandits with an Auto-Regressive Temporal Structure

Arxiv

0+阅读 · 2022年10月28日

Multi-Agent Reinforcement Learning is a Sequence Modeling Problem

Arxiv

0+阅读 · 2022年10月28日

Uncertainty Estimation Using Riemannian Model~Dynamics for Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年10月28日

A Survey of Uncertainty in Deep Neural Networks

Arxiv

30+阅读 · 2021年7月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

相关基金

基于印迹基因Dlk1探讨针刺阳明经穴防治痿病肌萎缩的表观遗传调控机制

国家自然科学基金

0+阅读 · 2015年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

BNIP3在脊髓损伤后神经元线粒体自噬中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pax6基因与弹尾纲眼区的发育和进化关系

国家自然科学基金

0+阅读 · 2012年12月31日

RyR钙释放通道-调节蛋白复合体的结构与功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

三江中段夏塞银铅锌矿床微量元素富集机制及对成矿过程的指示：硫化物微区分析

国家自然科学基金

0+阅读 · 2012年12月31日

砷暴露人群DNA甲基化与地砷病及尿砷代谢模式的关系

国家自然科学基金

0+阅读 · 2012年12月31日

核苷酸切除修复通路基因tSNPs筛选及其与高发区食管癌易感性

国家自然科学基金

0+阅读 · 2010年12月31日

PGRMC1蛋白在肾癌中的功能及作用机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用导向合金改变Mg/Al扩散通道的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员