The financial domain has proven to be a fertile source of challenging machine learning problems across a variety of tasks including prediction, clustering, and classification. Researchers can access an abundance of time-series data and even modest performance improvements can be translated into significant additional value. In this work, we consider the use of case-based reasoning for an important task in this domain, by using historical stock returns time-series data for industry sector classification. We discuss why time-series data can present some significant representational challenges for conventional case-based reasoning approaches, and in response, we propose a novel representation based on stock returns embeddings, which can be readily calculated from raw stock returns data. We argue that this representation is well suited to case-based reasoning and evaluate our approach using a large-scale public dataset for the industry sector classification task, demonstrating substantial performance improvements over several baselines using more conventional representations.
翻译:金融领域已被证明是各种机器学习问题的丰富来源,包括预测、聚类和分类。研究人员可以访问大量的时间序列数据,即使是较小的性能改进也可以转化为显著的附加价值。在这项工作中,我们考虑在这一领域中使用基于案例推理的方法,通过使用历史股票回报时间序列数据进行行业部门分类。我们讨论了为什么时间序列数据在传统的基于案例推理方法中可能会出现一些重要的表征挑战。为此,我们提出了一种基于股票回报嵌入的新颖表示方法,可以从原始股票回报数据中轻松计算。我们认为这种表示方法非常适合案例推理,并使用一个大规模的公共数据集对我们的方法进行评估,该数据集用于行业部门分类任务,并且相比使用更传统的表示方法的几个基线表现出明显的性能提高。