We revisit the problem of finding optimal strategies for deterministic Markov Decision Processes (DMDPs), and a closely related problem of testing feasibility of systems of $m$ linear inequalities on $n$ real variables with at most two variables per inequality (2VPI). We give a randomized trade-off algorithm solving both problems and running in $\tilde{O}(nmh+(n/h)^3)$ time using $\tilde{O}(n^2/h+m)$ space for any parameter $h\in [1,n]$. In particular, using subquadratic space we get $\tilde{O}(nm+n^{3/2}m^{3/4})$ running time, which improves by a polynomial factor upon all the known upper bounds for non-dense instances with $m=O(n^{2-\epsilon})$. Moreover, using linear space we match the randomized $\tilde{O}(nm+n^3)$ time bound of Cohen and Megiddo [SICOMP'94] that required $\tilde{\Theta}(n^2+m)$ space. Additionally, we show a new algorithm for the Discounted All-Pairs Shortest Paths problem, introduced by Madani et al. [TALG'10], that extends the DMDPs with optional end vertices. For the case of uniform discount factors, we give a deterministic algorithm running in $\tilde{O}(n^{3/2}m^{3/4})$ time, which improves significantly upon the randomized bound $\tilde{O}(n^2\sqrt{m})$ of Madani et al.
翻译:我们重新审视了找到确定性Markov 决策程序(DMDPs)最佳战略的问题,以及一个密切相关的测试美元实际变量(每个不平等2VPI)最多有两个变量的系统是否可行的问题。我们给出了随机化交易算法来解决两个问题并以$tilqde{O}(nmh+(n/h)3)3美元运行。此外,使用线性空间我们匹配了美元和美元参数的随机化 $\tilde{O}(n%2/h+m) 。特别是,使用亚赤道空间,我们得到了美元和美元的实际变量的线性不平等。我们得到了美元(n+n=3/2}(m/4}) 美元实际变量的线性不平等。 运行时间, 以一个多元值的算法来改善所有已知的非经常值的上限 。 使用线性空间的随机化 $\\\\%} (n+n%3} 时间端点将科恩和梅迪多[SICOM $94] 的值。