零和风险敏感度连续时间连续随机游戏,无约束的支付和过渡率及波波雷空间 (Zero-sum risk-sensitive continuous-time stochastic games with unbounded payoff and transition rates and Borel spaces) - 专知论文

会员服务 ·

0

Extensibility · 马尔可夫链 · 奖励函数 · 纳什均衡 · 近似 ·

2021 年 3 月 6 日

Zero-sum risk-sensitive continuous-time stochastic games with unbounded payoff and transition rates and Borel spaces

翻译：零和风险敏感度连续时间连续随机游戏,无约束的支付和过渡率及波波雷空间

Junyu Zhang,Xianping Guo,Li Xia

We study a finite-horizon two-person zero-sum risk-sensitive stochastic game for continuous-time Markov chains and Borel state and action spaces, in which payoff rates, transition rates and terminal reward functions are allowed to be unbounded from below and from above and the policies can be history-dependent. Under suitable conditions, we establish the existence of a solution to the corresponding Shapley equation (SE) by an approximation technique. Then, by the SE and the extension of the Dynkin's formula, we prove the existence of a Nash equilibrium and verify that the value of the stochastic game is the unique solution to the SE. Moreover, we develop a value iteration-type algorithm for approaching to the value of the stochastic game. The convergence of the algorithm is proved by a special contraction operator in our risk-sensitive stochastic game. Finally, we demonstrate our main results by two examples.

翻译：我们为连续时间的Markov链条和Borel州及行动空间研究一个对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感、对风险敏感等的限值的游戏,对连续时间Markov链条和Borel州及行动空间进行限值研究,允许从下到上、对回报率、过渡率和终极奖励功能不设限制,允许从上到上,对政策视历史而定。在适当条件下,我们通过近似技术,对相应的变相方方方方确定存在一个解决方案。然后,通过SEEE和Dynkin公式的延伸,我们证明纳什均衡的存在,并核实随机性游戏的价值是SEE的唯一解决办法。此外,我们还开发了一种价值迭交式套式的增值法算算算算算算算法,用两个例子证明我们的主要结果。此外,我们用两个例子展示。我们用特别收缩算算法的缩算。我们证明。

0

相关内容

Extensibility

iOS 8 提供的应用间和应用跟系统的功能交互特性。

Today (iOS and OS X): widgets for the Today view of Notification Center
Share (iOS and OS X): post content to web services or share content with others
Actions (iOS and OS X): app extensions to view or manipulate inside another app
Photo Editing (iOS): edit a photo or video in Apple's Photos app with extensions from a third-party apps
Finder Sync (OS X): remote file storage in the Finder with support for Finder content annotation
Storage Provider (iOS): an interface between files inside an app and other apps on a user's device
Custom Keyboard (iOS): system-wide alternative keyboards

Source: iOS 8 Extensions: Apple’s Plan for a Powerful App Ecosystem

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

关关的刷题日记13——Leetcode 414. Third Maximum Number

关关的刷题日记13——Leetcode 414. Third Maximum Number

专知

3+阅读 · 2017年10月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Submixing and Shift-Invariant Stochastic Games

Arxiv

0+阅读 · 2021年4月30日

Dynamic population games

Arxiv

0+阅读 · 2021年4月29日

A Normal Form Characterization for Efficient Boolean Skolem Function Synthesis

Arxiv

0+阅读 · 2021年4月29日

Continuous-time locally stationary time series models

Arxiv

0+阅读 · 2021年4月28日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

VIP会员

文章信息

相关主题

马尔可夫链

相关VIP内容

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约认知战概念报告》

《预测促成大规模货运无人机的技术趋势与影响》报告

美海军放弃星座级转而采用国家安全巡逻舰设计

《北约作战弹性概念》报告

相关资讯

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

关关的刷题日记13——Leetcode 414. Third Maximum Number

关关的刷题日记13——Leetcode 414. Third Maximum Number

专知

3+阅读 · 2017年10月8日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Submixing and Shift-Invariant Stochastic Games

Arxiv

0+阅读 · 2021年4月30日

Dynamic population games

Arxiv

0+阅读 · 2021年4月29日

A Normal Form Characterization for Efficient Boolean Skolem Function Synthesis

Arxiv

0+阅读 · 2021年4月29日

Continuous-time locally stationary time series models

Arxiv

0+阅读 · 2021年4月28日

Mean Field Multi-Agent Reinforcement Learning

Arxiv

5+阅读 · 2018年6月12日

微信扫码咨询专知VIP会员