队列反应模型中的强化学习：在最优执行中的应用 (Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution)

We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.

翻译：我们研究了强化学习在元订单最优执行中的应用，其目标是在较长时间内逐步执行大宗订单，同时最小化执行差额和市场影响。与传统的价格动态和影响建模的参数化方法不同，我们采用了一种无模型、数据驱动的框架。由于策略优化需要历史数据无法提供的反事实反馈，我们采用队列反应模型来生成真实且可处理的限价订单簿模拟，该模拟涵盖了瞬时价格影响以及非线性、动态的订单流响应。在方法论上，我们训练了一个双深度Q网络智能体，其状态空间包括时间、库存、价格和深度变量，并评估其相对于既定基准的性能。数值模拟结果表明，该智能体学会了一种兼具战略性和战术性的策略，能够有效适应订单簿条件，并在多种训练配置中优于标准方法。这些发现提供了强有力的证据，表明无模型强化学习可以为最优执行问题提供自适应且稳健的解决方案。