We present a reinforcement learning (RL) approach for robust optimisation of risk-aware performance criteria. To allow agents to express a wide variety of risk-reward profiles, we assess the value of a policy using rank dependent expected utility (RDEU). RDEU allows the agent to seek gains, while simultaneously protecting themselves against downside events. To robustify optimal policies against model uncertainty, we assess a policy not by its distribution, but rather, by the worst possible distribution that lies within a Wasserstein ball around it. Thus, our problem formulation may be viewed as an actor choosing a policy (the outer problem), and the adversary then acting to worsen the performance of that strategy (the inner problem). We develop explicit policy gradient formulae for the inner and outer problems, and show its efficacy on three prototypical financial problems: robust portfolio allocation, optimising a benchmark, and statistical arbitrage
翻译:我们提出了一种强化学习(RL)方法,以大力优化风险意识业绩标准。为了让代理商能够表达各种各样的风险回报情况,我们评估了使用依赖性预期公用事业(RDEU)的等级标准(RDEU)的政策的价值。RDEU允许代理商寻求收益,同时保护自己免受不利事件的影响。为了强化针对模式不确定性的最佳政策,我们评估了一种政策,而不是根据其分布,而是以其周围瓦塞斯坦球内最差的分布。因此,我们的问题拟订可被视为选择政策(外部问题)的行为者,而对手则会采取行动使该战略的绩效恶化(内部问题)。我们为内部和外部问题制定了明确的政策梯度公式,并展示其在三种典型金融问题上的效力:稳健的投资组合分配、优化基准和统计套利。