Pair trading is one of the most effective statistical arbitrage strategies which seeks a neutral profit by hedging a pair of selected assets. Existing methods generally decompose the task into two separate steps: pair selection and trading. However, the decoupling of two closely related subtasks can block information propagation and lead to limited overall performance. For pair selection, ignoring the trading performance results in the wrong assets being selected with irrelevant price movements, while the agent trained for trading can overfit to the selected assets without any historical information of other assets. To address it, in this paper, we propose a paradigm for automatic pair trading as a unified task rather than a two-step pipeline. We design a hierarchical reinforcement learning framework to jointly learn and optimize two subtasks. A high-level policy would select two assets from all possible combinations and a low-level policy would then perform a series of trading actions. Experimental results on real-world stock data demonstrate the effectiveness of our method on pair trading compared with both existing pair selection and trading methods.
翻译:Pair交易是最有效的统计套利战略之一,它通过套期套期保值来寻求中性利润,套期保值。现有的方法一般将任务分化为两个不同的步骤:一对选择和交易。然而,两个密切相关的子任务脱钩可以阻碍信息传播,导致总体绩效有限。对于对口选择,无视交易业绩结果,选择错误的资产时不相关的价格波动,而接受过交易培训的代理人可以在没有其他资产的历史信息的情况下,对选定资产进行套用。为了解决这个问题,我们在本文件中提出了一个自动对口交易模式,作为统一的任务,而不是双步管道。我们设计了一个等级强化学习框架,以共同学习和优化两个子任务。一项高级政策将从所有可能的组合中选择两种资产,而一项低级政策随后将采取一系列交易行动。现实世界股票数据的实验结果表明,与现有的对口选择和交易方法相比,我们的对口交易方法是有效的。