CVaR (Conditional Value at Risk) is a risk metric widely used in finance. However, dynamically optimizing CVaR is difficult since it is not a standard Markov decision process (MDP) and the principle of dynamic programming fails. In this paper, we study the infinite-horizon discrete-time MDP with a long-run CVaR criterion, from the view of sensitivity-based optimization. By introducing a pseudo CVaR metric, we derive a CVaR difference formula which quantifies the difference of long-run CVaR under any two policies. The optimality of deterministic policies is derived. We obtain a so-called Bellman local optimality equation for CVaR, which is a necessary and sufficient condition for local optimal policies and only necessary for global optimal policies. A CVaR derivative formula is also derived for providing more sensitivity information. Then we develop a policy iteration type algorithm to efficiently optimize CVaR, which is shown to converge to local optima in the mixed policy space. We further discuss some extensions including the mean-CVaR optimization and the maximization of CVaR. Finally, we conduct numerical experiments relating to portfolio management to demonstrate the main results. Our work may shed light on dynamically optimizing CVaR from a sensitivity viewpoint.
翻译:CVAR(风险条件值)是一种在金融中广泛使用的风险衡量标准。然而,动态优化 CVAR是困难的,因为它不是一个标准的Markov 决策程序(MDP)和动态编程原则失败。在本文中,我们从基于敏感性的优化角度,研究具有长期运行的 CVAR 标准的无限和离散 MDP(离散的MDP ) 。通过引入一个假的 CVAR 衡量标准,我们得出CVAR 差异公式,该公式可以量化长期运行的 CVAR 在任何两种政策下的差异。确定性政策的最佳性是衍生出来的。我们为CVAR 取得了所谓的贝尔曼本地最佳化方程式,这是当地最佳政策的一个必要和充分的条件,并且只是全球最佳政策所必要的。CVAR 衍生公式也可以用来提供更敏感的信息。然后,我们通过引入一种政策 Iteration 类型算法,以高效优化 CVAR,这在混合政策空间中显示会与本地的选制。我们进一步讨论了一些扩展,包括从中中的平均-CVAR 优化到我们的主要组合,最终可以展示我们动态的动态管理结果。