全球窗口战略综合报告 (Strategy Synthesis for Global Window PCTL)

Given a Markov decision process (MDP) $M$ and a formula $\Phi$, the strategy synthesis problem asks if there exists a strategy $\sigma$ s.t. the resulting Markov chain $M[\sigma]$ satisfies $\Phi$. This problem is known to be undecidable for the probabilistic temporal logic PCTL. We study a class of formulae that can be seen as a fragment of PCTL where a local, bounded horizon property is enforced all along an execution. Moreover, we allow for linear expressions in the probabilistic inequalities. This logic is at the frontier of decidability, depending on the type of strategies considered. In particular, strategy synthesis is decidable when strategies are deterministic while the general problem is undecidable.

翻译：考虑到Markov决策程序(MDP)$M美元和公式$Phi$,战略综合问题询问是否存在一项战略($\sigma$ s.t),由此产生的Markov链($M[\sgma]$x$x$\Phi$)满足了美元。众所周知,对于概率性时间逻辑PCTL来说,这个问题是无法消化的。我们研究了一组公式,这些公式可以被视为PCTL的碎片,在这种公式中,一个局部的、封闭的地平线属性随执行而执行。此外,我们允许在概率不平等中出现线性表达。这一逻辑处于可变性的前沿,取决于所考虑的战略类型。特别是,当战略具有确定性,而一般问题不可变现时,战略综合是可以分化的。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日