限制顺序图书娱乐模式(LOBRM):扩展分析 (The Limit Order Book Recreation Model (LOBRM): An Extended Analysis)

from arxiv, 16 pages, preprint accepted for publication in the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021)

The limit order book (LOB) depicts the fine-grained demand and supply relationship for financial assets and is widely used in market microstructure studies. Nevertheless, the availability and high cost of LOB data restrict its wider application. The LOB recreation model (LOBRM) was recently proposed to bridge this gap by synthesizing the LOB from trades and quotes (TAQ) data. However, in the original LOBRM study, there were two limitations: (1) experiments were conducted on a relatively small dataset containing only one day of LOB data; and (2) the training and testing were performed in a non-chronological fashion, which essentially re-frames the task as interpolation and potentially introduces lookahead bias. In this study, we extend the research on LOBRM and further validate its use in real-world application scenarios. We first advance the workflow of LOBRM by (1) adding a time-weighted z-score standardization for the LOB and (2) substituting the ordinary differential equation kernel with an exponential decay kernel to lower computation complexity. Experiments are conducted on the extended LOBSTER dataset in a chronological fashion, as it would be used in a real-world application. We find that (1) LOBRM with decay kernel is superior to traditional non-linear models, and module ensembling is effective; (2) prediction accuracy is negatively related to the volatility of order volumes resting in the LOB; (3) the proposed sparse encoding method for TAQ exhibits good generalization ability and can facilitate manifold tasks; and (4) the influence of stochastic drift on prediction accuracy can be alleviated by increasing historical samples.

翻译：然而,LOB娱乐模式(LOBRM)最近提议通过从交易和报价(TAQ)数据中合成LOB娱乐模式(LOBRM)来弥补这一差距。然而,在最初的LOBRM研究中,有两个局限性:(1) 在一个相对较小的数据集上进行了实验,该数据集仅包含LOB数据的一天;(2) 以非时序方式进行了培训和测试,基本上将LOBB数据的提供和高成本限制了其更广泛的应用。LOBBB数据的提供和进一步验证其在现实世界应用情景中的使用。我们在最初的LOBRMRM数据系统中首先将LOBRM的工作流程增加:(1) 为LOB增加一个时间加权的z-核心标准化,(2) 将普通差异方程式替换为加速的流变率,以降低计算复杂性。在LOBSTAR的扩展性能力上重新界定任务,在LOBSTAR的精确度上进行实验,在历史周期中,我们使用一个历史序列模型来提高LMB的精确性。