In this paper we pursue the question of a fully online trading algorithm (i.e. one that does not need offline training on previously gathered data). For this task we use Double Deep $Q$-learning in the episodic setting with Fast Learning Networks approximating the expected reward $Q$. Additionally, we define the possible terminal states of an episode in such a way as to introduce a mechanism to conserve some of the money in the trading pool when market conditions are seen as unfavourable. Some of these money are taken as profit and some are reused at a later time according to certain criteria. After describing the algorithm, we test it using the 1-minute-tick data for Cardano's price on Binance. We see that the agent performs better than trading with randomly chosen actions on each timestep. And it does so when tested on the whole dataset as well as on different subsets, capturing different market trends.
翻译:在本文中,我们探讨完全在线交易算法(即不需要以前收集的数据进行离线培训的算法)的问题。在这项工作中,我们使用双深Q($Q)学习方法,在快速学习网络的偶发环境中使用双深Q($Q)学习方法,接近预期的回报。此外,我们定义一个插曲的可能终点状态,以便引入一种机制,在市场条件被视为不利时在交易池中保存部分资金。其中一些钱被视为利润,有些钱则根据某些标准在稍后时间重新使用。在描述算法之后,我们用卡达诺在宾尼斯的价格的1分钟点点数据测试它。我们看到代理商的表现优于在每一时间步骤上随机选择的行动。当对整个数据集和不同子集进行测试时,它会这样做,捕捉到不同的市场趋势。</s>