金融市场适应性学习,混合以模型为基础和无模型的RL,以确定波动目标 (Adaptive learning for financial markets mixing model-based and model-free RL for volatility targeting)

Model-Free Reinforcement Learning has achieved meaningful results in stable environments but, to this day, it remains problematic in regime changing environments like financial markets. In contrast, model-based RL is able to capture some fundamental and dynamical concepts of the environment but suffer from cognitive bias. In this work, we propose to combine the best of the two techniques by selecting various model-based approaches thanks to Model-Free Deep Reinforcement Learning. Using not only past performance and volatility, we include additional contextual information such as macro and risk appetite signals to account for implicit regime changes. We also adapt traditional RL methods to real-life situations by considering only past data for the training sets. Hence, we cannot use future information in our training data set as implied by K-fold cross validation. Building on traditional statistical methods, we use the traditional "walk-forward analysis", which is defined by successive training and testing based on expanding periods, to assert the robustness of the resulting agent. Finally, we present the concept of statistical difference's significance based on a two-tailed T-test, to highlight the ways in which our models differ from more traditional ones. Our experimental results show that our approach outperforms traditional financial baseline portfolio models such as the Markowitz model in almost all evaluation metrics commonly used in financial mathematics, namely net performance, Sharpe and Sortino ratios, maximum drawdown, maximum drawdown over volatility.

翻译：在稳定的环境中,无模式强化学习取得了有意义的成果,但时至今日,在金融市场等不断变化的制度环境中,这种学习仍成问题。相比之下,基于模型的RL能够捕捉到一些基本和动态的环境概念,但有认知偏差。在这项工作中,我们提议通过选择各种基于模型的方法,将两种技术的最好结合起来。我们不仅利用过去的绩效和波动性,而且还包括更多的背景信息,如宏观和风险胃口信号,以说明隐含的制度变化。我们还通过只考虑培训组的过去数据,使传统的RL方法适应现实生活状况。因此,我们无法使用K倍交叉验证所隐含的培训数据中的未来信息。在传统统计方法的基础上,我们使用传统的“前进分析”来将这两种技术的最好结合起来,这种分析是通过在延长的时期连续培训和测试来界定的。我们不仅使用以往的业绩和波动性,而且我们还利用了基于两面的T-测试的统计差异概念,以突出我们模型与较传统模型不同的方式。因此,我们无法使用未来的信息数据。我们实验结果显示,即我们采用最起码的模型,即最接近于最起码的模型,即最起码的模型,即最接近于最起码的压的金融组合的模型。