价值迭代在具有定量目标的随机博弈中的停止准则 (Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives) - 专知论文

会员服务 ·

0

值迭代 · 随机博弈 · 准则 · 博弈 · 视觉识别系统 ·

2023 年 4 月 19 日

Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives

翻译：价值迭代在具有定量目标的随机博弈中的停止准则

Jan Křetínský,Tobias Meggendorfer,Maximilian Weininger

A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value iteration (VI). Due to its good practical performance, this approximative approach is typically preferred over exact techniques, even though no practical bounds on the imprecision of the result could be given until recently. As a consequence, even the most used model checkers could return arbitrarily wrong results. Over the past decade, different works derived stopping criteria, indicating when the precision reaches the desired level, for various settings, in particular MDP with reachability, total reward, and mean payoff, and SG with reachability. In this paper, we provide the first stopping criteria for VI on SG with total reward and mean payoff, yielding the first anytime algorithms in these settings. To this end, we provide the solution in two flavours: First through a reduction to the MDP case and second directly on SG. The former is simpler and automatically utilizes any advances on MDP. The latter allows for more local computations, heading towards better practical efficiency. Our solution unifies the previously mentioned approaches for MDP and SG and their underlying ideas. To achieve this, we isolate objective-specific subroutines as well as identify objective-independent concepts. These structural concepts, while surprisingly simple, form the very essence of the unified solution.

翻译：价值迭代（Value Iteration，VI）是解决马尔可夫决策过程（MDP）和随机博弈（SG）的经典方法之一。由于其良好的实用性能，通常比精确技术更受青睐，即使直到最近也无法给出结果不准确程度的实用范围内界限。因此，即使是使用最多的模型检查器也可能返回任意错误的结果。在过去十年中，不同的工作针对不同设置（特别是具有可达性、总奖励和平均回报的MDP和具有可达性的SG）导出了停止准则。在本文中，我们为具有总奖励和平均回报的SG提供了第一个VI停止准则，为这些设置提供了第一个任意时算法。为此，我们提供了两种解决方案：一是通过将其化简为MDP的情况从而更简单地实现，“自动”利用了MDP的任何进展，二是直接在SG上提供更本地化的计算，从而实现更好的实用效率。我们的解决方案统一了MDP和SG的前述方法及其基本思想。为了实现这一目标，我们隔离了特定于目标的子程序，并确定了目标无关的概念。这些结构概念，虽然出人意料地简单，却是统一解决方案的核心。

0

相关内容

值迭代

【EPFL-Nicolas Boumal新书】光滑流形优化导论，362页pdf，An introduction to optimization on smooth manifolds

【EPFL-Nicolas Boumal新书】光滑流形优化导论，362页pdf，An introduction to optimization on smooth manifolds

专知会员服务

34+阅读 · 2022年3月4日

【AAAI 2022】使用点反馈与标准离线黑箱算法的在线影响力最大化问题

【AAAI 2022】使用点反馈与标准离线黑箱算法的在线影响力最大化问题

专知会员服务

14+阅读 · 2022年1月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【微软&CMU】后向特征校正，深度学习如何深度学习？Backward Feature Correction: How Deep Learning Performs Deep Learning

专知会员服务

13+阅读 · 2020年1月18日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【图灵奖Yoshua Bengio】ICLR2020论文：一个元转移的目标学习解开因果机制（A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms）

【图灵奖Yoshua Bengio】ICLR2020论文：一个元转移的目标学习解开因果机制（A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms）

专知会员服务

55+阅读 · 2019年9月26日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机动态系统的风险分析及其最优控制问题

国家自然科学基金

1+阅读 · 2014年12月31日

基于等级依赖期望效用的投资组合优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

常值推力航天器自主交会对接鲁棒控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

考虑微结构随机性的三维高阶MRCT多尺度计算理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Petri网的构件组装正确性研究

国家自然科学基金

0+阅读 · 2008年12月31日

A General Perspective on Objectives of Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

Online Learning with Feedback Graphs: The True Shape of Regret

Arxiv

0+阅读 · 2023年6月5日

Transmitter Selection for Secrecy Against Colluding Eavesdroppers with Backhaul Uncertainty

Arxiv

0+阅读 · 2023年6月4日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

A fast and accurate computation method for reflective diffraction simulations

Arxiv

0+阅读 · 2023年6月1日

TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest

Arxiv

0+阅读 · 2023年5月31日

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Arxiv

0+阅读 · 2023年5月31日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

VIP会员

文章信息

相关主题

视觉识别系统

相关VIP内容

【EPFL-Nicolas Boumal新书】光滑流形优化导论，362页pdf，An introduction to optimization on smooth manifolds

【EPFL-Nicolas Boumal新书】光滑流形优化导论，362页pdf，An introduction to optimization on smooth manifolds

专知会员服务

34+阅读 · 2022年3月4日

【AAAI 2022】使用点反馈与标准离线黑箱算法的在线影响力最大化问题

【AAAI 2022】使用点反馈与标准离线黑箱算法的在线影响力最大化问题

专知会员服务

14+阅读 · 2022年1月16日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

经典书《斯坦福大学-多智能体系统》532页pdf，MULTIAGENT SYSTEMS Algorithmic, Game-Theoretic, and Logical Foundations

专知会员服务

158+阅读 · 2020年1月29日

【微软&CMU】后向特征校正，深度学习如何深度学习？Backward Feature Correction: How Deep Learning Performs Deep Learning

专知会员服务

13+阅读 · 2020年1月18日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【图灵奖Yoshua Bengio】ICLR2020论文：一个元转移的目标学习解开因果机制（A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms）

【图灵奖Yoshua Bengio】ICLR2020论文：一个元转移的目标学习解开因果机制（A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms）

专知会员服务

55+阅读 · 2019年9月26日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A General Perspective on Objectives of Reinforcement Learning

Arxiv

0+阅读 · 2023年6月5日

Online Learning with Feedback Graphs: The True Shape of Regret

Arxiv

0+阅读 · 2023年6月5日

Transmitter Selection for Secrecy Against Colluding Eavesdroppers with Backhaul Uncertainty

Arxiv

0+阅读 · 2023年6月4日

Refined Regret for Adversarial MDPs with Linear Function Approximation

Arxiv

0+阅读 · 2023年6月1日

A fast and accurate computation method for reflective diffraction simulations

Arxiv

0+阅读 · 2023年6月1日

TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest

Arxiv

0+阅读 · 2023年5月31日

Restless Bandits with Average Reward: Breaking the Uniform Global Attractor Assumption

Arxiv

0+阅读 · 2023年5月31日

A survey and taxonomy of loss functions in machine learning

Arxiv

26+阅读 · 2023年1月13日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

A Survey of Reinforcement Learning Techniques: Strategies, Recent Development, and Future Directions

Arxiv

80+阅读 · 2020年1月19日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

随机动态系统的风险分析及其最优控制问题

国家自然科学基金

1+阅读 · 2014年12月31日

基于等级依赖期望效用的投资组合优化问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

常值推力航天器自主交会对接鲁棒控制方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

关于图上随机游走、渗流的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

考虑微结构随机性的三维高阶MRCT多尺度计算理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Petri网的构件组装正确性研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员