基于权重时限博弈随机策略的记忆模拟 (Playing Stochastically in Weighted Timed Games to Emulate Memory) - 专知论文

会员服务 ·

0

博弈 · 发散 · 最优 · 可判定性 · 判定性 ·

2023 年 4 月 6 日

Playing Stochastically in Weighted Timed Games to Emulate Memory

翻译：基于权重时限博弈随机策略的记忆模拟

Benjamin Monmege,Julie Parreaux,Pierre-Alain Reynier

Weighted timed games are two-player zero-sum games played in a timed automaton equipped with integer weights. We consider optimal reachability objectives, in which one of the players, that we call Min, wants to reach a target location while minimising the cumulated weight. While knowing if Min has a strategy to guarantee a value lower than a given threshold is known to be undecidable (with two or more clocks), several conditions, one of them being the divergence, have been given to recover decidability. In such weighted timed games (like in untimed weighted games in the presence of negative weights), Min may need finite memory to play (close to) optimally. This is thus tempting to try to emulate this finite memory with other strategic capabilities. In this work, we allow the players to use stochastic decisions, both in the choice of transitions and of timing delays. We give for the first time a definition of the expected value in weighted timed games, overcoming several theoretical challenges. We then show that, in divergent weighted timed games, the stochastic value is indeed equal to the classical (deterministic) value, thus proving that Min can guarantee the same value while only using stochastic choices, and no memory.

翻译：权重时限博弈是在定时自动机上玩的两个零和游戏，自动机带有整数权重。我们考虑最优可达性目标，在此目标中，我们称其中之一的玩家为Min，他想要到达目标位置，同时使累积的权重最小化。虽然已知Min是否有策略以保证小于给定阈值的价值是不可判定的（存在两个或更多时钟），但是已经给出了一些条件，其中之一是发散，以恢复可判定性。在这样的权重时限博弈（与存在负权重的无定时权重博弈一样），Min可能需要有限的记忆来实现（接近）最优。因此，尝试用其他战略能力模拟有限记忆是很有诱惑力的。在这项工作中，我们允许玩家在选择跃迁和定时延迟时使用随机决策。我们首次给出了权重时限博弈中预期值的定义，克服了几个理论挑战。然后，我们证明在发散的权重时限博弈中，随机值确实等于经典（确定性）值，从而证明Min可以只使用随机选择而无需记忆来保证相同的价值。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【论文推荐】基于元学习的小样本链接预测：FEW SHOT LINK PREDICTION VIA META LEARNING

【论文推荐】基于元学习的小样本链接预测：FEW SHOT LINK PREDICTION VIA META LEARNING

专知会员服务

57+阅读 · 2019年12月23日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

关于随机MAX SAT和(2+p)-SAT模型可满足阈值的研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于细菌逃避中性粒细胞追逐的最优策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

离散时间马氏链的泛函不等式及遍历性

国家自然科学基金

0+阅读 · 2014年12月31日

基于Markov博弈的计算机网络对抗行动策略分析与建模研究

国家自然科学基金

17+阅读 · 2013年12月31日

广义Markov跳变系统的非同步控制

国家自然科学基金

0+阅读 · 2013年12月31日

Fitzhugh-Nagumo方程和抛物型方程组的时间最优控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的动力学性态

国家自然科学基金

0+阅读 · 2012年12月31日

两类生物动力系统的分岔问题

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

具有复杂形式的生物动力系统的分岔问题

国家自然科学基金

0+阅读 · 2011年12月31日

Approximating Energy Market Clearing and Bidding With Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Quasi continuous level Monte Carlo

Arxiv

0+阅读 · 2023年5月25日

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

Arxiv

0+阅读 · 2023年5月24日

Model-Based Performance Analysis of the HyTeG Finite Element Framework

Arxiv

0+阅读 · 2023年5月24日

Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Arxiv

0+阅读 · 2023年5月24日

Multi-Structural Games and Beyond

Arxiv

0+阅读 · 2023年5月23日

Notes on Causation, Comparison, and Regression

Arxiv

0+阅读 · 2023年5月23日

MPOGames: Efficient Multimodal Partially Observable Dynamic Games

Arxiv

0+阅读 · 2023年5月23日

Flexible Bayesian Quantile Analysis of Residential Rental Rates

Arxiv

0+阅读 · 2023年5月23日

Adaptive directional estimator of the density in R^d for independent and mixing sequences

Arxiv

0+阅读 · 2023年5月22日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

【牛津大学ICLR2020】通过元学习的贝叶斯自适应深度RL, VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

专知会员服务

25+阅读 · 2020年2月28日

【论文推荐】基于元学习的小样本链接预测：FEW SHOT LINK PREDICTION VIA META LEARNING

【论文推荐】基于元学习的小样本链接预测：FEW SHOT LINK PREDICTION VIA META LEARNING

专知会员服务

57+阅读 · 2019年12月23日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

强化学习扫盲贴：从Q-learning到DQN

强化学习扫盲贴：从Q-learning到DQN

夕小瑶的卖萌屋

52+阅读 · 2019年10月13日

谷歌足球游戏环境使用介绍

谷歌足球游戏环境使用介绍

CreateAMind

33+阅读 · 2019年6月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Approximating Energy Market Clearing and Bidding With Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Quasi continuous level Monte Carlo

Arxiv

0+阅读 · 2023年5月25日

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

Arxiv

0+阅读 · 2023年5月24日

Model-Based Performance Analysis of the HyTeG Finite Element Framework

Arxiv

0+阅读 · 2023年5月24日

Parallel Spiking Neurons with High Efficiency and Ability to Learn Long-term Dependencies

Arxiv

0+阅读 · 2023年5月24日

Multi-Structural Games and Beyond

Arxiv

0+阅读 · 2023年5月23日

Notes on Causation, Comparison, and Regression

Arxiv

0+阅读 · 2023年5月23日

MPOGames: Efficient Multimodal Partially Observable Dynamic Games

Arxiv

0+阅读 · 2023年5月23日

Flexible Bayesian Quantile Analysis of Residential Rental Rates

Arxiv

0+阅读 · 2023年5月23日

Adaptive directional estimator of the density in R^d for independent and mixing sequences

Arxiv

0+阅读 · 2023年5月22日

相关基金

关于随机MAX SAT和(2+p)-SAT模型可满足阈值的研究

国家自然科学基金

0+阅读 · 2015年12月31日

关于细菌逃避中性粒细胞追逐的最优策略研究

国家自然科学基金

0+阅读 · 2015年12月31日

离散时间马氏链的泛函不等式及遍历性

国家自然科学基金

0+阅读 · 2014年12月31日

基于Markov博弈的计算机网络对抗行动策略分析与建模研究

国家自然科学基金

17+阅读 · 2013年12月31日

广义Markov跳变系统的非同步控制

国家自然科学基金

0+阅读 · 2013年12月31日

Fitzhugh-Nagumo方程和抛物型方程组的时间最优控制问题

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的动力学性态

国家自然科学基金

0+阅读 · 2012年12月31日

两类生物动力系统的分岔问题

国家自然科学基金

0+阅读 · 2012年12月31日

中国通胀预期形成、前瞻性时变货币政策规则与收敛速度：基于适应性学习行为的实证研究与模拟

国家自然科学基金

0+阅读 · 2012年12月31日

具有复杂形式的生物动力系统的分岔问题

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员