Interval Markov Decision Processes (IMDPs) are uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we show that solving the max-min problem associated to value iteration is equivalent to solving $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in the case where the action set $\mathcal{A}$ is a polytope and the transition bounds are linear, synthesizing over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.
翻译:Interval Markov 决策进程(IMDPs) 是不确定的 Markov 模型, 其过渡概率是周期性的。 最近, 对使用 IMDPs 作为控制合成的随机系统的抽象集成进行了大量研究。 但是, 由于缺乏对 IMDPs 进行合成的算法, 且具有连续的动作空间, 行动空间被假定为离散的优先, 这是许多应用中的一种限制性假设 。 以此为动力, 我们引入了连续动作 IMDPs (caIMDPs), 过渡概率的界限是动作变量的函数, 并且研究将iMDPs 用作尽可能扩大预期累积收益的代谢值。 具体地说, 我们证明, 与 IMDPs 合成值相关的最大问题相当于 $mathcal $cal $, 其中$malcalcalalalalalal 动作是 。 然后, 我们利用这些最大问题的简单形式, 我们找到了一些关于 CIMDPs 的数值是有效的解析点( 例如,, 线性或直线或直线式的解) 。 我们获得了其它的动作是 exal- cal- cal a cal a calcurrideal cas case a cas case) 。