We investigate the potential of Multi-Objective, Deep Reinforcement Learning for stock and cryptocurrency single-asset trading: in particular, we consider a Multi-Objective algorithm which generalizes the reward functions and discount factor (i.e., these components are not specified a priori, but incorporated in the learning process). Firstly, using several important assets (cryptocurrency pairs BTCUSD, ETHUSDT, XRPUSDT, and stock indexes AAPL, SPY, NIFTY50), we verify the reward generalization property of the proposed Multi-Objective algorithm, and provide preliminary statistical evidence showing increased predictive stability over the corresponding Single-Objective strategy. Secondly, we show that the Multi-Objective algorithm has a clear edge over the corresponding Single-Objective strategy when the reward mechanism is sparse (i.e., when non-null feedback is infrequent over time). Finally, we discuss the generalization properties with respect to the discount factor. The entirety of our code is provided in open source format.
翻译:我们调查多目标、深度强化学习对于股票和加密货币单一资产交易的潜力:特别是,我们考虑一种多目标算法,该算法概括了奖励功能和折扣系数(即这些组成部分没有先验性,而是纳入学习过程)。首先,我们使用若干重要资产(催眠货币对子BTCUSD, ETUST, XRPUSDT, 和股票指数AAPL, SPY, NIFTY50),我们核查拟议的多目标算法的奖励一般化属性,并提供初步统计证据,表明对相应的单一目标战略的预测稳定性增强。第二,我们表明,当奖励机制缺乏时,多目标算法比相应的单一目标战略有明显的优势(即非核心反馈在时间上并不常见)。最后,我们讨论与折扣系数有关的一般化属性。我们的代码以开放源格式提供。