正则化政策更新以稳定均场博弈 (Regularization of the policy updates for stabilizing Mean Field Games) - 专知论文

会员服务 ·

0

博弈 · 智能体 · 正则化 · 多智能体 · 多状态 ·

2023 年 4 月 4 日

Regularization of the policy updates for stabilizing Mean Field Games

翻译：正则化政策更新以稳定均场博弈

Talal Algumaei,Ruben Solozabal,Reda Alami,Hakim Hacid,Merouane Debbah,Martin Takac

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm \textit{Mean Field Proximal Policy Optimization (MF-PPO)}, and we empirically show the effectiveness of our method in the OpenSpiel framework.

翻译：本文研究了一种非合作的多智能体强化学习（MARL），其中多个智能体在同一环境中相互作用，其目标是最大化个人回报。由于众多智能体引入的非稳定性，当智能体数量扩大时，会出现挑战。为了解决这个问题，均场博弈（MFG）依赖于对称性和均匀性假设来近似具有非常大人口的博弈。最近，深层强化学习已被用于扩展具有更多状态的MFG。目前的方法依赖于平滑技术，如平均q值或均场分布更新。本文提出了一种不同的方法来稳定学习，该方法基于均场策略上的接近更新。我们将算法命名为\textit{均场近端政策优化（MF-PPO）}，并在OpenSpiel框架中实证展示了我们方法的有效性。

0

相关内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

中国产业政策与竞争政策的有效性及协调机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

微分方程周期解问题的全局收敛性算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

空间极值模型的贝叶斯推断及其在气候变化政策中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维离子成像技术研究里高密度德堡态NO分子产生的超低温等离子体空间结构及膨胀过程

国家自然科学基金

0+阅读 · 2013年12月31日

右端不连续时滞神经网络的多稳定性与分岔控制

国家自然科学基金

0+阅读 · 2012年12月31日

随机拟微分算子及其在控制论中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于复共线性诊断的正则化方法及其在大地测量中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

Attention to Mean-Fields for Particle Cloud Generation

Arxiv

0+阅读 · 2023年5月24日

Policy Learning based on Deep Koopman Representation

Arxiv

0+阅读 · 2023年5月24日

Fairness in Streaming Submodular Maximization over a Matroid Constraint

Arxiv

0+阅读 · 2023年5月24日

CoinEM: Tuning-Free Particle-Based Variational Inference for Latent Variable Models

Arxiv

0+阅读 · 2023年5月24日

Constrained Proximal Policy Optimization

Arxiv

0+阅读 · 2023年5月23日

On the layer crossing problem for a semi-infinite hydraulic fracture

Arxiv

0+阅读 · 2023年5月22日

Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning

Arxiv

0+阅读 · 2023年5月22日

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2023年5月18日

Game Theory with Simulation of Other Players

Arxiv

0+阅读 · 2023年5月18日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

【ICML2020-伯克利】稳定非策略强化学习的表示，Representations for Stable Off-Policy Reinforcement Learning

专知会员服务

17+阅读 · 2020年7月14日

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

【基于模型的强化学习的博弈论框架】A Game Theoretic Framework for Model Based Reinforcement Learning

专知会员服务

131+阅读 · 2020年4月19日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

【NeurIPS 2019|经典论文奖】正则随机学习和在线优化的双重平均法（Dual Averaging Method for Regularized Stochastic Learning and Online Optimization），微软研究院Lin Xiao

专知会员服务

17+阅读 · 2019年12月9日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Attention to Mean-Fields for Particle Cloud Generation

Arxiv

0+阅读 · 2023年5月24日

Policy Learning based on Deep Koopman Representation

Arxiv

0+阅读 · 2023年5月24日

Fairness in Streaming Submodular Maximization over a Matroid Constraint

Arxiv

0+阅读 · 2023年5月24日

CoinEM: Tuning-Free Particle-Based Variational Inference for Latent Variable Models

Arxiv

0+阅读 · 2023年5月24日

Constrained Proximal Policy Optimization

Arxiv

0+阅读 · 2023年5月23日

On the layer crossing problem for a semi-infinite hydraulic fracture

Arxiv

0+阅读 · 2023年5月22日

Explicit Personalization and Local Training: Double Communication Acceleration in Federated Learning

Arxiv

0+阅读 · 2023年5月22日

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Arxiv

0+阅读 · 2023年5月18日

Game Theory with Simulation of Other Players

Arxiv

0+阅读 · 2023年5月18日

Learning Latent Representations to Influence Multi-Agent Interaction

Arxiv

11+阅读 · 2020年11月12日

相关基金

中国产业政策与竞争政策的有效性及协调机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

微分方程周期解问题的全局收敛性算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

空间极值模型的贝叶斯推断及其在气候变化政策中的应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

三维离子成像技术研究里高密度德堡态NO分子产生的超低温等离子体空间结构及膨胀过程

国家自然科学基金

0+阅读 · 2013年12月31日

右端不连续时滞神经网络的多稳定性与分岔控制

国家自然科学基金

0+阅读 · 2012年12月31日

随机拟微分算子及其在控制论中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

基于复共线性诊断的正则化方法及其在大地测量中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

p进表示的伽罗瓦上同调

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员