序列零苏姆运动会和最小控制 (Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control) - 专知论文

会员服务 ·

0

策略迭代 · CASE · 策略改进 · 策略评估 · UniFormer ·

2021 年 10 月 20 日

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

翻译：序列零苏姆运动会和最小控制

Dimitri Bertsekas

We introduce a contractive abstract dynamic programming framework and related policy iteration algorithms, specifically designed for sequential zero-sum games and minimax problems with a general structure. Aside from greater generality, the advantage of our algorithms over alternatives is that they resolve some long-standing convergence difficulties of the "natural" policy iteration algorithm, which have been known since the Pollatschek and Avi-Itzhak method [PoA69] for finite-state Markov games. Mathematically, this "natural" algorithm is a form of Newton's method for solving Bellman's equation, but Newton's method, contrary to the case of single-player DP problems, is not globally convergent in the case of a minimax problem, because the Bellman operator may have components that are neither convex nor concave. Our algorithms address this difficulty by introducing alternating player choices, and by using a policy-dependent mapping with a uniform sup-norm contraction property, similar to earlier works by Bertsekas and Yu [BeY10], [BeY12], [YuB13]. Moreover, our algorithms allow a convergent and highly parallelizable implementation, which is based on state space partitioning, and distributed asynchronous policy evaluation and policy improvement operations within each set of the partition. Our framework is also suitable for the use of reinforcement learning methods based on aggregation, which may be useful for large-scale problem instances.

翻译：我们引入了契约式的抽象动态编程框架和相关的政策迭代算法,这些算法是专为连续零和游戏和一般结构的小问题设计的。除了更为笼统外,我们的算法对替代方法的优势在于它们解决了“自然”政策迭代算法(Pollatschek 和 Avi-Itzhak 方法[PoA69] 以来已知的“自然”转换算法(PoA69] 的“自然”算法(Pollatschek 和 Avi-Itzhak 方法(PoA69 ), 用于限定状态的Markov游戏。从数学角度讲,这种“自然”算法是牛顿解决贝尔曼方程式等式的方法的一种形式,但与单一玩家DP问题的情况相反,我们的算法的这一方法在全球范围并非趋同,因为Bertsekas和Yu[Bey10] 的算法操作中,[Y12],[Yu-nable 的算法系统操作可能使得我们以高度的平行的分类化和平行的校正的校正化政策框架得以使用。

0

相关内容

策略迭代

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

已删除

将门创投

5+阅读 · 2017年8月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Decentralized Mean Field Games

Decentralized Mean Field Games

Arxiv

0+阅读 · 2021年12月16日

Utility maximizing load balancing policies

Arxiv

0+阅读 · 2021年12月16日

A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

Arxiv

0+阅读 · 2021年12月16日

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月15日

The Power of Communication in a Distributed Multi-Agent System

Arxiv

0+阅读 · 2021年12月14日

Conjugated Discrete Distributions for Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

The $f$-Divergence Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年12月14日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

相关VIP内容

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

面向具身智能的多模态数据存储与检索：综述

《算法战争研究计划全景评估》35页

【CMU博士论文】水下三维视觉感知与生成

智能体战争：自主人工智能军备竞赛全景透视

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

已删除

将门创投

5+阅读 · 2017年8月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Decentralized Mean Field Games

Decentralized Mean Field Games

Arxiv

0+阅读 · 2021年12月16日

Utility maximizing load balancing policies

Arxiv

0+阅读 · 2021年12月16日

A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games

Arxiv

0+阅读 · 2021年12月16日

Greedy-Step Off-Policy Reinforcement Learning

Arxiv

0+阅读 · 2021年12月15日

Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

Arxiv

0+阅读 · 2021年12月15日

The Power of Communication in a Distributed Multi-Agent System

Arxiv

0+阅读 · 2021年12月14日

Conjugated Discrete Distributions for Distributional Reinforcement Learning

Arxiv

0+阅读 · 2021年12月14日

The $f$-Divergence Reinforcement Learning Framework

Arxiv

0+阅读 · 2021年12月14日

Modelling Behavioural Diversity for Learning in Open-Ended Games

Arxiv

11+阅读 · 2021年3月14日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员