We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. This demonstrates how an optimal execution agent learns optimal trading decisions inside a simulated reactive market framework and how this in turn generates a back-reaction that changes the simulated market through the introduction of strategic order-splitting.
翻译:我们考虑了单一强化学习最佳执行交易代理商的学习动态,当它与事件驱动代理商金融市场模式发生互动时,我们考虑了单一强化学习最佳执行交易代理商的学习动态。交易在时间上通过匹配引擎进行。最佳执行代理商在不同层次的初始订单规模和不同规模的国家空间中进行考虑。因此对以代理商为基础的模式和市场的影响采用一种校准方法来考虑。这种校准方法探索了经验性标准化事实和价格影响曲线的变化。使用趋同、数量轨迹和行动跟踪图来直观学习动态。这显示了最佳执行代理商如何在模拟反应性市场框架内学习最佳贸易决定,以及这反过来又如何产生后回反应,通过引入战略秩序分裂来改变模拟市场。