与外观抽样的零和零和实心游戏 (Learning Zero-sum Stochastic Games with Posterior Sampling) - 专知论文

会员服务 ·

0

学成 · 样本 · 张成子空间 · 有偏 · 泛函 ·

2021 年 9 月 8 日

Learning Zero-sum Stochastic Games with Posterior Sampling

翻译：与外观抽样的零和零和实心游戏

Mehdi Jafarnia-Jahromi,Rahul Jain,Ashutosh Nayyar

In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. This improves the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et. al., 2017 under the same assumption and matches the theoretical lower bound in $A$ and $T$.

翻译：在本文中,我们提议为零和沙发运动会(PSRL-ZSG)提供Poside Servication Securement Learning,这是第一个在线学习算法,它使巴伊西亚人以平均回报标准在无限一等零和零和随机游戏中以美元(O(HS\sqrt{AT})为遗憾)实现遗憾。这里,H$是偏差功能的上限,$S是国家数量,$A$是联合行动的数量,$T$是地平线。我们认为,在网上设置中,对手无法控制,可以采取任何任意的时间适应历史的战略。这改善了Wei等人根据同一假设对2017年美元(O)(Sqrt[3]{DS%2AT})的现有最佳遗憾约束,并符合以美元和美元计算的较低理论约束。

0

相关内容

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

128+阅读 · 2021年4月25日

《C++17完全指南》中文版，402页pdf

专知会员服务

158+阅读 · 2021年3月6日

【ICML2020Tutorial】机器学习信号处理，100页ppt

【ICML2020Tutorial】机器学习信号处理，100页ppt

专知会员服务

113+阅读 · 2020年8月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

专知会员服务

228+阅读 · 2020年4月21日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

专知会员服务

137+阅读 · 2019年10月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Sequential Fair Allocation: Achieving the Optimal Envy-Efficiency Tradeoff Curve

Arxiv

0+阅读 · 2021年10月29日

Variational Bayesian Optimistic Sampling

Arxiv

0+阅读 · 2021年10月29日

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives

Arxiv

1+阅读 · 2021年10月28日

Stochastic Bias-Reduced Gradient Methods

Arxiv

0+阅读 · 2021年10月28日

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2021年10月27日

Local Differential Privacy for Regret Minimization in Reinforcement Learning

Arxiv

0+阅读 · 2021年10月27日

Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity

Arxiv

0+阅读 · 2021年10月26日

Scheduling Jobs with Stochastic Holding Costs

Arxiv

0+阅读 · 2021年10月26日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

张成子空间

相关VIP内容

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

【普林斯顿干货书】强化学习与随机优化，728页pdf阐述序列决策统一框架

专知会员服务

128+阅读 · 2021年4月25日

《C++17完全指南》中文版，402页pdf

专知会员服务

158+阅读 · 2021年3月6日

【ICML2020Tutorial】机器学习信号处理，100页ppt

【ICML2020Tutorial】机器学习信号处理，100页ppt

专知会员服务

113+阅读 · 2020年8月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

【硬核书】博弈论导论，417页pdf，Game Theory: An Introduction，普林斯顿大学出版社

专知会员服务

228+阅读 · 2020年4月21日

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

【硬核书】信息论，528页pdf，Information Theory and Coding by Example

专知会员服务

148+阅读 · 2020年4月20日

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

【UIUC硬核书】统计学习理论，Statistical Learning Theory，213页pdf

专知会员服务

134+阅读 · 2020年4月14日

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

【电子书】统计学习的要素第二版（The Elements of Statistical Learning）764页PDF免费下载

专知会员服务

137+阅读 · 2019年10月30日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Sequential Fair Allocation: Achieving the Optimal Envy-Efficiency Tradeoff Curve

Arxiv

0+阅读 · 2021年10月29日

Variational Bayesian Optimistic Sampling

Arxiv

0+阅读 · 2021年10月29日

Accelerating Robotic Reinforcement Learning via Parameterized Action Primitives

Arxiv

1+阅读 · 2021年10月28日

Stochastic Bias-Reduced Gradient Methods

Arxiv

0+阅读 · 2021年10月28日

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Arxiv

0+阅读 · 2021年10月27日

Local Differential Privacy for Regret Minimization in Reinforcement Learning

Arxiv

0+阅读 · 2021年10月27日

Adversarial Online Learning with Variable Plays in the Pursuit-Evasion Game: Theoretical Foundations and Application in Connected and Automated Vehicle Cybersecurity

Arxiv

0+阅读 · 2021年10月26日

Scheduling Jobs with Stochastic Holding Costs

Arxiv

0+阅读 · 2021年10月26日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

Variance Reduction Methods for Sublinear Reinforcement Learning

Arxiv

4+阅读 · 2018年4月25日

微信扫码咨询专知VIP会员