Optimal execution is a sequential decision-making problem for cost-saving in algorithmic trading. Studies have found that reinforcement learning (RL) can help decide the order-splitting sizes. However, a problem remains unsolved: how to place limit orders at appropriate limit prices? The key challenge lies in the "continuous-discrete duality" of the action space. On the one hand, the continuous action space using percentage changes in prices is preferred for generalization. On the other hand, the trader eventually needs to choose limit prices discretely due to the existence of the tick size, which requires specialization for every single stock with different characteristics (e.g., the liquidity and the price range). So we need continuous control for generalization and discrete control for specialization. To this end, we propose a hybrid RL method to combine the advantages of both of them. We first use a continuous control agent to scope an action subset, then deploy a fine-grained agent to choose a specific limit price. Extensive experiments show that our method has higher sample efficiency and better training stability than existing RL algorithms and significantly outperforms previous learning-based methods for order execution.
翻译:优化执行是算法交易成本节约的顺序决策问题。 研究发现, 强化学习( RL) 有助于决定顺序分割的大小。 然而, 问题仍未解决: 如何在适当的限价价格下设置限制订单? 关键的挑战在于动作空间的“ 连续分明的双重性 ” 。 一方面, 使用价格百分比变化的连续行动空间更适合一般化。 另一方面, 交易商最终需要单独选择限制价格, 这是因为存在秒大小, 需要为具有不同特性( 如流动性和价格范围)的每一个单一股票进行专门化。 因此, 我们需要持续控制一般化和独立控制来专门化。 为此, 我们提出一种混合的 RL 方法, 将两者的优势结合起来。 我们首先使用连续的控制代理来扩展一个行动子集, 然后部署一个精细的代理来选择特定的限价。 广泛的实验显示, 我们的方法比现有的 RL 算法和以往的学习执行顺序方法要高, 。