Portfolio management aims at maximizing the return on investment while minimizing risk by continuously reallocating the assets forming the portfolio. These assets are not independent but correlated during a short time period. A graph convolutional reinforcement learning framework called DeepPocket is proposed whose objective is to exploit the time-varying interrelations between financial instruments. These interrelations are represented by a graph whose nodes correspond to the financial instruments while the edges correspond to a pair-wise correlation function in between assets. DeepPocket consists of a restricted, stacked autoencoder for feature extraction, a convolutional network to collect underlying local information shared among financial instruments, and an actor-critic reinforcement learning agent. The actor-critic structure contains two convolutional networks in which the actor learns and enforces an investment policy which is, in turn, evaluated by the critic in order to determine the best course of action by constantly reallocating the various portfolio assets to optimize the expected return on investment. The agent is initially trained offline with online stochastic batching on historical data. As new data become available, it is trained online with a passive concept drift approach to handle unexpected changes in their distributions. DeepPocket is evaluated against five real-life datasets over three distinct investment periods, including during the Covid-19 crisis, and clearly outperformed market indexes.
翻译:投资组合管理的目的是通过不断重新分配组成投资组合的资产来最大限度地实现投资回报最大化,同时通过不断重新分配资产来尽量减少风险。这些资产并不独立,而是在很短的时间内相互关联。一个名为DeepPocket的图形革命强化学习框架,目的是利用金融工具之间时间变化的相互关系。这些相互关系以一个图表为代表,其节点与金融工具相对应,而边缘则与资产之间的双向关联功能相对应。DeepPocket由一个限制的、堆叠的用于地物提取的自动编码器组成,一个收集金融工具之间共享的当地基本信息的连锁网络和一个行为体-critic 强化学习代理人组成。行为者-critic 结构包含两个革命性网络,行为者在其中学习和执行一种投资政策,反过来,由批评者加以评价,以便确定最佳的行动方向,不断将各种投资组合资产重新定位,以优化投资的预期回报。该代理人最初接受在线对历史数据分类的离线培训。随着新数据的获得,它接受在线培训,以被动的观念方式进行在线培训,包括深层-19危机前五个时期的数据流分析。