The common purpose of applying reinforcement learning (RL) to asset management is the maximization of profit. The extrinsic reward function used to learn an optimal strategy typically does not take into account any other preferences or constraints. We have developed a regularization method that ensures that strategies have global intrinsic affinities, i.e., different personalities may have preferences for certain assets which may change over time. We capitalize on these intrinsic policy affinities to make our RL model inherently interpretable. We demonstrate how RL agents can be trained to orchestrate such individual policies for particular personality profiles and still achieve high returns.
翻译:将强化学习(RL)应用于资产管理的共同目标是利润最大化,用于学习最佳战略的外部奖励功能通常不考虑任何其他偏好或制约因素,我们制定了一种正规化方法,确保战略具有全球性的内在联系,即不同的人可能偏好某些资产,这些资产可能随时间而变化。我们利用这些内在政策上的相似性,使我们的RL模式能够内在地解释。我们展示了如何培训RL代理人员如何为特定个性特征制定这种个别政策,并仍然获得高回报。