Risk management is critical in decision-making, and mean-variance (MV) trade-off is one of the most common criteria. However, in reinforcement learning (RL) under a dynamic environment, MV control is not as easy as that under a static environment owing to computational difficulties. For MV controlled RL, this paper proposes direct expected quadratic utility maximization (EQUM), where a mean-variance efficient agent is given as its solution. This approach does not only avoid computational difficulties but also improves empirical performances. In experiments, we demonstrate the effectiveness of the proposed EQUM with benchmark settings.
翻译:风险管理在决策中至关重要,平均差(MV)权衡是最常见的标准之一,然而,在动态环境中的强化学习(RL)中,由于计算困难,在静态环境中的MV控制并不那么容易,对于MV控制的RL,本文件建议直接预期四重力最大化(EQUM),以平均差(MV)有效代理作为解决办法,这种办法不仅避免计算困难,而且改进经验性能。在实验中,我们展示了拟议的EQUM与基准环境的有效性。