Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with their environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially Observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm reporting a 2.68 Sharpe Ratio on unseen data set (test data). From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.
翻译:深入强化学习算法可以推广到以往难以解决的问题。在股票市场上实现盈利的自动化可以使用DRL,将金融资产价格“预测”步骤和投资组合的“分配”步骤合并在一个统一的进程中,形成完全自主的系统,能够与环境互动,通过试验和错误作出最佳决定。这项工作代表了DRL模式,在股票市场上创造有利可图的贸易,有效克服监督学习方法的局限性。我们把交易问题作为部分不记分的Markov决策程序(POMDP)模式,考虑到股票市场带来的制约因素,例如流动性和交易成本。然后,我们用双延深确定性政策(TD3)算法,报告一套可视数据2.68的快速比率(测试数据)。从股票市场预测和智能决策机制的角度来看,本文显示了DRL在金融市场优于其他类型的机器学习,并证明了其战略决策的可信度和优势。