使用图表机学习的股票关联矩阵的学习嵌入式代表表 (Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning)

Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes representing the corresponding pair-wise correlation coefficients. The traditional network science techniques, which are extensively utilized in financial literature, require handcrafted features such as centrality measures to understand such correlation networks. However, manually enlisting all such handcrafted features may quickly turn out to be a daunting task. Instead, we propose a new approach for studying nuances and relationships within the correlation network in an algorithmic way using a graph machine learning algorithm called Node2Vec. In particular, the algorithm compresses the network into a lower dimensional continuous space, called an embedding, where pairs of nodes that are identified as similar by the algorithm are placed closer to each other. By using log returns of S&P 500 stock data, we show that our proposed algorithm can learn such an embedding from its correlation network. We define various domain specific quantitative (and objective) and qualitative metrics that are inspired by metrics used in the field of Natural Language Processing (NLP) to evaluate the embeddings in order to identify the optimal one. Further, we discuss various applications of the embeddings in investment management.

翻译：金融工具之间的非线性关系在投资过程中有各种各样的应用,包括风险管理、证券组合建设和贸易战略等。在这里,我们注重股票之间的相互联系,基于它们的相互关系矩阵,我们作为网络代表的是代表单个股票的节点和代表对应对对对相相关系数的对结点之间的加权联系。传统的网络科学技术在金融文献中广泛使用,需要手工制作的特征,如理解这种关联网络的中心措施等。然而,人工获取所有这些手工制作的特征可能很快变成一项艰巨的任务。相反,我们建议采用一种新的方法,用算法方式研究相关网络内的细微和关系,使用名为 Node2Vec 的图表机学习算法。特别是,算法将网络压缩成一个较低维度的连续空间,称为嵌入,在其中,被算法确认相似的对节点的对等,彼此贴近。通过S&P 500存量数据的日志回报,我们提议的算算法可以从其相关网络中学习这种嵌入。我们用算法的方式界定了不同领域的具体量化(和目的)和定性指标应用,我们用到最优化的里程的内压,我们用来对各种的实地评估。