We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.
翻译:我们考虑单一强化学习最佳执行交易代理商的学习动态,当它与事件驱动代理商金融市场模式发生互动时,我们考虑单一强化学习最佳执行交易代理商的学习动态。交易在时间上通过匹配引擎不同步地进行。最佳执行代理商在初始定序大小和不同规模国家空间的不同层面得到考虑。因此,对基于代理商的模式和市场的影响,我们考虑使用一种校准方法,该校准方法探索经验级事实和价格影响曲线的变化。使用趋同、数量轨迹和行动跟踪图图将学习动态视觉化。在这里,较小的州空间代理商所访问的州数比较大的州空间代理商聚集得快得多,他们能够开始学习如何利用扩展和体积状态进行直线交易。我们发现,模型的瞬间对学习代理商的影响是强劲的,但赫斯特号则因引入战略秩序分拆而降低。学习代理商的引入保留了价格影响曲线的形状,但在其交易量增加时可以减少贸易指派的自动关系。