将反事实遗憾最小化与信息收益相结合,以解决大范围不完善信息游戏 (Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information) - 专知论文

会员服务 ·

0

INFORMS · Extensibility · 不完美信息 · 回合 · INTERACT ·

2021 年 10 月 15 日

Combining Counterfactual Regret Minimization with Information Gain to Solve Extensive Games with Imperfect Information

翻译：将反事实遗憾最小化与信息收益相结合,以解决大范围不完善信息游戏

Chen Qiu,Xuan Wang,Tianzi Ma,Yaojun Wen,Jiajia Zhang

Counterfactual regret Minimization (CFR) is an effective algorithm for solving extensive games with imperfect information (IIEG). However, CFR is only allowed to apply in a known environment such as the transition functions of the chance player and reward functions of the terminal nodes are aware in IIEGs. For uncertain scenarios like the cases under Reinforcement Learning (RL), variational information maximizing exploration (VIME) provides a useful framework for exploring environments using information gain. In this paper, we propose a method named VCFR that combines CFR with information gain to calculate Nash Equilibrium (NE) in the scenario of IIEG under RL. By adding information gain to the reward, the average strategy calculated by CFR can be directly used as an interactive strategy, and the exploration efficiency of the algorithm to uncertain environments has been significantly improved. Experimentally, The results demonstrate that this approach can not only effectively reduce the number of interactions with the environment, but also find an approximate NE.

翻译：反事实遗憾最小化(CFR)是解决信息不完善的大游戏的有效算法(IIEG),然而,CFR只被允许在已知的环境中应用,例如机会玩家的过渡功能和终端节点的奖赏功能在IIEGs中已经意识到。对于诸如加强学习(RL)下的案例等不确定的情景,变异信息最大化探索(VIME)为利用信息收益探索环境提供了一个有用的框架。在本文中,我们提出了一个名为VCFR(VCFR)的方法,将CFR(CFR)与根据REG(IIEG)方案计算 Nash Equilibrium(NE)的信息收益结合起来。通过将信息收益添加到收益中,CFR计算的平均战略可以直接用作互动战略,并大大提高了对不确定环境的算法的探索效率。实验结果表明,这一方法不仅能够有效减少与环境的互动次数,而且还可以找到近似NE。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】对比多视角表示学习

【ICML2020】对比多视角表示学习

专知会员服务

53+阅读 · 2020年6月28日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

An improper estimator with optimal excess risk in misspecified density estimation and logistic regression

Arxiv

0+阅读 · 2021年12月8日

Declarative Approaches to Counterfactual Explanations for Classification

Arxiv

0+阅读 · 2021年12月7日

Player of Games

Arxiv

0+阅读 · 2021年12月6日

Event-Based Communication in Distributed Q-Learning

Arxiv

0+阅读 · 2021年12月6日

Optimal Counterfactual Explanations in Tree Ensembles

Arxiv

5+阅读 · 2021年6月25日

Cross-domain Imitation from Observations

Arxiv

8+阅读 · 2021年5月20日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

不完美信息

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【ICML2020】对比多视角表示学习

【ICML2020】对比多视角表示学习

专知会员服务

53+阅读 · 2020年6月28日

可解释强化学习，Explainable Reinforcement Learning: A Survey

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

OpenAI丨深度强化学习关键论文列表

OpenAI丨深度强化学习关键论文列表

中国人工智能学会

17+阅读 · 2018年11月10日

【OpenAI】深度强化学习关键论文列表

【OpenAI】深度强化学习关键论文列表

专知

11+阅读 · 2018年11月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

An improper estimator with optimal excess risk in misspecified density estimation and logistic regression

Arxiv

0+阅读 · 2021年12月8日

Declarative Approaches to Counterfactual Explanations for Classification

Arxiv

0+阅读 · 2021年12月7日

Player of Games

Arxiv

0+阅读 · 2021年12月6日

Event-Based Communication in Distributed Q-Learning

Arxiv

0+阅读 · 2021年12月6日

Optimal Counterfactual Explanations in Tree Ensembles

Arxiv

5+阅读 · 2021年6月25日

Cross-domain Imitation from Observations

Arxiv

8+阅读 · 2021年5月20日

Unifying Online and Counterfactual Learning to Rank

Arxiv

6+阅读 · 2020年12月8日

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

Arxiv

3+阅读 · 2020年6月15日

Thermodynamics and Feature Extraction by Machine Learning

Arxiv

3+阅读 · 2018年10月18日

Variance-based regularization with convex objectives

Arxiv

5+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员