Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome.
翻译:金融市场是几十年来一直引起注意的非静止的多层面时间序列的来源,每个金融工具都有其特定的变化,随着时间的变化,使得其分析成为一项复杂的任务。增进对金融时间序列分析方法的理解和开发对于金融市场的成功运作至关重要。在本研究报告中,我们提出一个基于数量的数据预处理方法,使金融时间序列更适合机器学习管道。我们采用统计方法评估该方法的性能。也就是说,我们正式说明假设,制定相关的分类任务,用信任期计算影响大小,并进行统计测试,以验证假设。我们进一步评估了拟议的历史数据方法的贸易绩效,并将其与以前公布的方法进行比较。我们的分析表明,拟议的基于数量的方法能够成功地对金融时间序列模式进行分类,并导致比基于价格的行动方法的性能更好的分类,具体地优于更流动的金融工具。最后,我们提出了一种办法,从基于树的模型中直接获得特征的相互作用,如CatBoost 估测算仪,以及正式评估拟议的方法和SHAP特征与积极结果的关联性。