使用中流最佳优化微量当量元素的 Markov 决策程序 (Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents) - 专知论文

会员服务 ·

0

Markov · 优化器 · Processing（编程语言） · Learning · 值迭代 ·

2023 年 1 月 30 日

Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents

翻译：使用中流最佳优化微量当量元素的 Markov 决策程序

Wenhao Xu,Xuefeng Gao,Xuedong He

The optimized certainty equivalent (OCE) is a family of risk measures that cover important examples such as entropic risk, conditional value-at-risk and mean-variance models. In this paper, we propose a new episodic risk-sensitive reinforcement learning formulation based on tabular Markov decision processes with recursive OCEs. We design an efficient learning algorithm for this problem based on value iteration and upper confidence bound. We derive an upper bound on the regret of the proposed algorithm, and also establish a minimax lower bound. Our bounds show that the regret rate achieved by our proposed algorithm has optimal dependence on the number of episodes and the number of actions.

翻译：最优化的确定性等同(OCE)是一套风险措施,涵盖重要的例子,如:引温风险、有条件的风险价值和中位差模式。在本文件中,我们提议根据表单式Markov决定程序,采用循环的OCE,采用新的对风险敏感的强化学习方法。我们根据价值迭代和上层信任约束,为这一问题设计一个高效的学习算法。我们从提议的算法的遗憾中获取上层界限,并建立一个最小值的较低界限。我们的界限表明,我们提议的算法所实现的遗憾率最充分地依赖于事件数量和行动数量。

0

相关内容

Markov

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

CBX8促进肝癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

龙牡汤对特应性皮炎小鼠树突状细胞IgE高亲和力受体γ亚基启动子序列DNA甲基化水平调控的研究

国家自然科学基金

0+阅读 · 2015年12月31日

三维连续集成集成电路关键工艺技术和机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

一类Monge-Ampère方程解的边界行为

国家自然科学基金

0+阅读 · 2013年12月31日

PIM/BCL-xl和NF-κB/cIAPs凋亡通路在经典骨髓增殖性肿瘤凋亡中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子点标记A型核纤层蛋白前体及在衰老中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SUMO/DeSUMO化修饰在抑制性受体膜转运中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白磷酸酶1调节tau外显子10可变剪接及在AD致病过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

土壤氮素行为及其模拟模型不确定性的Monte-Carlo分析

国家自然科学基金

0+阅读 · 2008年12月31日

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

Arxiv

0+阅读 · 2023年3月21日

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations

Arxiv

0+阅读 · 2023年3月18日

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Arxiv

0+阅读 · 2023年3月17日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Arxiv

0+阅读 · 2023年3月17日

DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients

Arxiv

0+阅读 · 2023年3月17日

Maximum Margin Learning of t-SPNs for Cell Classification with Filtering

Arxiv

0+阅读 · 2023年3月17日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Joint Differentiable Optimization and Verification for Certified Reinforcement Learning

Arxiv

0+阅读 · 2023年3月21日

Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale

Arxiv

0+阅读 · 2023年3月20日

Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Arxiv

0+阅读 · 2023年3月20日

Dictionary-based model reduction for state estimation

Arxiv

0+阅读 · 2023年3月19日

Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations

Arxiv

0+阅读 · 2023年3月18日

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Arxiv

0+阅读 · 2023年3月17日

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Arxiv

0+阅读 · 2023年3月17日

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Arxiv

0+阅读 · 2023年3月17日

DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients

Arxiv

0+阅读 · 2023年3月17日

Maximum Margin Learning of t-SPNs for Cell Classification with Filtering

Arxiv

0+阅读 · 2023年3月17日

相关基金

CBX8促进肝癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

龙牡汤对特应性皮炎小鼠树突状细胞IgE高亲和力受体γ亚基启动子序列DNA甲基化水平调控的研究

国家自然科学基金

0+阅读 · 2015年12月31日

三维连续集成集成电路关键工艺技术和机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

一类Monge-Ampère方程解的边界行为

国家自然科学基金

0+阅读 · 2013年12月31日

PIM/BCL-xl和NF-κB/cIAPs凋亡通路在经典骨髓增殖性肿瘤凋亡中作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

量子点标记A型核纤层蛋白前体及在衰老中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

SUMO/DeSUMO化修饰在抑制性受体膜转运中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白磷酸酶1调节tau外显子10可变剪接及在AD致病过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

土壤氮素行为及其模拟模型不确定性的Monte-Carlo分析

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员