Interval Markov决策过程与连续行动空间 (Interval Markov Decision Processes with Continuous Action-Spaces) - 专知论文

会员服务 ·

0

值迭代 · Markov · 马尔科夫 · 转移概率 · 离散 ·

2023 年 4 月 7 日

Interval Markov Decision Processes with Continuous Action-Spaces

翻译：Interval Markov决策过程与连续行动空间

Giannis Delimpaltadakis,Morteza Lahijanian,Manuel Mazo Jr.,Luca Laurenti

from arxiv, This work will be presented at the 26th ACM International Conference on Hybrid Systems Computation and Control (HSCC), 09-12 May, 2023, San Antonio, TX, USA

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

翻译：区间马尔科夫决策过程（IMDPs）是有限状态的不确定马尔科夫模型，其中转移概率属于区间。最近，有越来越多的研究将IMDP作为随机系统的抽象来进行控制合成。然而，由于缺乏针对具有连续行动空间的IMDP的综合算法，因此先验地假设行动空间是离散的，这对许多应用来说是一种限制性假设。出于这个原因，我们引入了连续行动IMDP（caIMDPs），其中转移概率的界限是行动变量的函数，并研究最大化预期累积奖励的价值迭代。具体而言，我们将与价值迭代相关的max-min问题分解为$|\mathcal{Q}|$个max问题，其中$|\mathcal{Q}|$是caIMDP的状态数量。然后，利用这些max问题的简单形式，我们确定了在哪些情况下可以高效地解决caIMDP上的价值迭代（例如，使用线性或凸规划）。我们还获得了其他有趣的见解：例如，在行动集$\mathcal{A}$是多面体的某些情况下，综合离散行动IMDP，在其中行动是$\mathcal{A}$ 的顶点，足以实现最优性。我们通过数值实例展示了我们的结果。最后，我们讨论了将caIMDP用作控制综合的抽象的缩短。

0

相关内容

值迭代

【AI+兵棋推演】60页paper速读：美国空军兵棋推演多物网络行动路线自动分析方法，The wargame commodity course of action automated analysis method

【AI+兵棋推演】60页paper速读：美国空军兵棋推演多物网络行动路线自动分析方法，The wargame commodity course of action automated analysis method

专知会员服务

91+阅读 · 2022年3月18日

【干货书】贝叶斯推理决策，195页pdf

【干货书】贝叶斯推理决策，195页pdf

专知会员服务

94+阅读 · 2021年12月11日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】贝叶斯推断随机过程，449页pdf

【干货书】贝叶斯推断随机过程，449页pdf

专知会员服务

155+阅读 · 2020年8月27日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

生成扩散模型漫谈：一般框架之ODE篇

生成扩散模型漫谈：一般框架之ODE篇

PaperWeekly

1+阅读 · 2022年9月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（三）

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（三）

泡泡机器人SLAM

16+阅读 · 2019年4月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

专知

13+阅读 · 2018年3月8日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

振荡型积分的有界性质及其在色散方程中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

某些随机非线性发展方程组的动力学行为

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

反应动力学中非绝热效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

重粒子碰撞过程中的电子关联效应

国家自然科学基金

0+阅读 · 2009年12月31日

On regular sets of affine type in finite Desarguesian planes and related codes

Arxiv

0+阅读 · 2023年5月26日

Knowledge Extraction with Interval Temporal Logic Decision Trees

Arxiv

0+阅读 · 2023年5月26日

Online network change point detection with missing values and temporal dependence

Arxiv

0+阅读 · 2023年5月26日

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Arxiv

0+阅读 · 2023年5月25日

Stochastic metrology and the empirical distribution

Arxiv

0+阅读 · 2023年5月25日

Power Allocation for Multi-Access Channel with Generalized Power Constraint

Arxiv

0+阅读 · 2023年5月25日

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Arxiv

0+阅读 · 2023年5月25日

Deep Stochastic Processes via Functional Markov Transition Operators

Arxiv

0+阅读 · 2023年5月24日

Polynomial-Time Pseudodeterministic Construction of Primes

Arxiv

0+阅读 · 2023年5月24日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

VIP会员

文章信息

相关主题

相关VIP内容

【AI+兵棋推演】60页paper速读：美国空军兵棋推演多物网络行动路线自动分析方法，The wargame commodity course of action automated analysis method

【AI+兵棋推演】60页paper速读：美国空军兵棋推演多物网络行动路线自动分析方法，The wargame commodity course of action automated analysis method

专知会员服务

91+阅读 · 2022年3月18日

【干货书】贝叶斯推理决策，195页pdf

【干货书】贝叶斯推理决策，195页pdf

专知会员服务

94+阅读 · 2021年12月11日

机器学习组合优化

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】贝叶斯推断随机过程，449页pdf

【干货书】贝叶斯推断随机过程，449页pdf

专知会员服务

155+阅读 · 2020年8月27日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

【Facebook|AAAI2020】在合作的部分可观察博弈中通过搜索改进策略（Improving Policies via Search in Cooperative Partially Observable Games）

专知会员服务

16+阅读 · 2019年12月10日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

生成扩散模型漫谈：一般框架之ODE篇

生成扩散模型漫谈：一般框架之ODE篇

PaperWeekly

1+阅读 · 2022年9月1日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（三）

基于 Carsim 2016 和 Simulink的无人车运动控制联合仿真（三）

泡泡机器人SLAM

16+阅读 · 2019年4月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

【论文推荐】最新八篇主题模型相关论文—主题建模优化、变分推断、情绪强度、神经语言模型、搜索、社区聚合、主题建模的问题、光谱学习

专知

13+阅读 · 2018年3月8日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

On regular sets of affine type in finite Desarguesian planes and related codes

Arxiv

0+阅读 · 2023年5月26日

Knowledge Extraction with Interval Temporal Logic Decision Trees

Arxiv

0+阅读 · 2023年5月26日

Online network change point detection with missing values and temporal dependence

Arxiv

0+阅读 · 2023年5月26日

Vector-Valued Variation Spaces and Width Bounds for DNNs: Insights on Weight Decay Regularization

Arxiv

0+阅读 · 2023年5月25日

Stochastic metrology and the empirical distribution

Arxiv

0+阅读 · 2023年5月25日

Power Allocation for Multi-Access Channel with Generalized Power Constraint

Arxiv

0+阅读 · 2023年5月25日

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Arxiv

0+阅读 · 2023年5月25日

Deep Stochastic Processes via Functional Markov Transition Operators

Arxiv

0+阅读 · 2023年5月24日

Polynomial-Time Pseudodeterministic Construction of Primes

Arxiv

0+阅读 · 2023年5月24日

A Survey of Decision Making in Adversarial Games

Arxiv

84+阅读 · 2022年7月16日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

对偶Auslander转置及其诱导模类的同调性质研究

国家自然科学基金

0+阅读 · 2015年12月31日

Heisenberg群与Minkowski空间中的非线性椭圆方程

国家自然科学基金

0+阅读 · 2014年12月31日

振荡型积分的有界性质及其在色散方程中的应用

国家自然科学基金

0+阅读 · 2013年12月31日

某些随机非线性发展方程组的动力学行为

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

幂零李群上热核估计的几个问题

国家自然科学基金

0+阅读 · 2012年12月31日

反应动力学中非绝热效应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

一类四阶MEMS方程的解集结构与解的渐近性态

国家自然科学基金

0+阅读 · 2011年12月31日

重粒子碰撞过程中的电子关联效应

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员