Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on three domains, including a road navigation domain based on real traffic data. Our experimental results demonstrate that our lexicographic approach attains improved expected cost while maintaining the optimal CVaR.
翻译:在Markov决策程序(MDPs)中,规划通常对预期成本有选择性,然而,优化预期并不考虑对MDP的任何特定运行而言,总成本可能高得令人无法接受;另一种办法是找到一种政策,对风险风险的有条件价值等风险反向目标有选择性。在这项工作中,我们首先表明,可以有多种政策获得最佳的CVaR。我们制定了尽量减少预期成本的地名录优化问题,但受成本总额CVaR最佳程度的限制。我们提出了这一问题的算法,并评估了我们在三个领域的做法,包括基于实际交通数据的公路导航领域。我们的实验结果表明,我们的地名录方法在保持最佳CVaR的同时,实现了预期成本的提高。