We introduce the first end-to-end Deep Reinforcement Learning (DRL) based framework for active high frequency trading. We train DRL agents to trade one unit of Intel Corporation stock by employing the Proximal Policy Optimization algorithm. The training is performed on three contiguous months of high frequency Limit Order Book data, of which the last month constitutes the validation data. In order to maximise the signal to noise ratio in the training data, we compose the latter by only selecting training samples with largest price changes. The test is then carried out on the following month of data. Hyperparameters are tuned using the Sequential Model Based Optimization technique. We consider three different state characterizations, which differ in their LOB-based meta-features. Analysing the agents' performances on test data, we argue that the agents are able to create a dynamic representation of the underlying environment. They identify occasional regularities present in the data and exploit them to create long-term profitable trading strategies. Indeed, agents learn trading strategies able to produce stable positive returns in spite of the highly stochastic and non-stationary environment.
翻译:我们引入了第一个基于端到端深强化学习(DRL)的活跃高频交易框架。 我们通过使用Proximal 政策优化算法,培训DRL代理商来交易英特尔公司股票的一个单位。 培训是在连续三个月的高频限制命令书数据基础上进行的, 其中上个月的数据构成验证数据。 为了在培训数据中尽量扩大信号与噪音的比例, 我们只选择价格变化最大的培训样本, 然后在下一个月进行测试。 超参数使用序列模型优化技术来调整。 我们考虑三种不同的状态特征, 它们在基于LOB的元功能上存在差异。 分析代理人在测试数据上的性能, 我们说这些代理人能够对基础环境产生动态的描述。 他们确定数据中偶尔存在的规律并利用这些数据来创造长期的盈利交易战略。 事实上, 代理人学会贸易战略能够产生稳定的正回报, 尽管环境高度偏差和非静止。