We study a game between liquidity provider and liquidity taker agents interacting in an over-the-counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with associated shared policy learning constitutes an efficient solution to this problem. Precisely, we show that our deep-reinforcement-learning-driven agents learn emergent behaviors relative to a wide spectrum of incentives encompassing profit-and-loss, optimal execution and market share, by playing against each other. In particular, we find that liquidity providers naturally learn to balance hedging and skewing as a function of their incentives, where the latter refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL-based calibration algorithm which we found performed well at imposing constraints on the game equilibrium, both on toy and real market data.
翻译:我们研究在场外市场中流动资金提供者和流动性吸收者之间互动的游戏,典型的例子是外汇。我们展示了如何以合适的方法设计奖励功能的参数化家庭以及相关的共同政策学习,从而有效地解决这一问题。我们确切地表明,我们的深层强化学习驱动者通过相互竞争,学习一系列广泛的激励,包括利润和损失、最佳执行和市场份额。特别是,我们发现流动资金提供者自然会学会平衡套期保值和扭曲,以此作为其激励的一种功能,而后者则把其购买和销售价格不对称地当作其库存的一种功能。我们还引入了一种新的基于RL的校准算法,我们发现该算法在对游戏平衡,包括玩具数据和实际市场数据施加限制方面表现得很好。