Reinforcement learning (RL) is gaining attention by more and more researchers in quantitative finance as the agent-environment interaction framework is aligned with decision making process in many business problems. Most of the current financial applications using RL algorithms are based on model-free method, which still faces stability and adaptivity challenges. As lots of cutting-edge model-based reinforcement learning (MBRL) algorithms mature in applications such as video games or robotics, we design a new approach that leverages resistance and support (RS) level as regularization terms for action in MBRL, to improve the algorithm's efficiency and stability. From the experiment results, we can see RS level, as a market timing technique, enhances the performance of pure MBRL models in terms of various measurements and obtains better profit gain with less riskiness. Besides, our proposed method even resists big drop (less maximum drawdown) during COVID-19 pandemic period when the financial market got unpredictable crisis. Explanations on why control of resistance and support level can boost MBRL is also investigated through numerical experiments, such as loss of actor-critic network and prediction error of the transition dynamical model. It shows that RS indicators indeed help the MBRL algorithms to converge faster at early stage and obtain smaller critic loss as training episodes increase.
翻译:由于代理商-环境互动框架与许多商业问题的决策过程相一致,因此越来越多的定量金融研究人员正在关注强化学习(RL),因为代理商-环境互动框架与许多商业问题的决策程序相一致。目前使用RL算法的金融应用大多以无模式方法为基础,仍面临稳定性和适应性挑战。随着许多在视频游戏或机器人等应用中成熟的基于模型的强化学习(MBRL)计算法,我们设计了一种新的方法,利用抗力和支持水平作为MBRL行动的正规化条件,以提高算法的效率和稳定性。从实验结果中,我们可以将RS水平视为市场定时技术,从各种计量和适应性挑战的角度看,提高纯MBRL模型的性能,并以较少的风险获得更好的利润收益。此外,在CVID-19大流行期间,当金融市场出现不可预测的危机时,我们提出的方法甚至抵制了大幅下降(次之次的最大缩放 ) 。关于抗力和支持水平的管制能够提升MBRL的规范性条件的解释,也是通过数字实验来调查,例如丧失演员-批评网络和预测过渡动态损失阶段的预测错误。它确实使RS指标更加趋近。