Markov Decision Processes (MDPs) are an effective way to formally describe many Machine Learning problems. In fact, recently MDPs have also emerged as a powerful framework to model financial trading tasks. For example, financial MDPs can model different market scenarios. However, the learning of a (near-)optimal policy for each of these financial MDPs can be a very time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find a similar financial MDP for which we have already learned its policy, and then reuse such policy in the learning of a new policy for a new financial MDP. Such a knowledge transfer between market scenarios raises several issues. On the one hand, how to measure the similarity between financial MDPs. On the other hand, how to use this similarity measurement to effectively transfer the knowledge between financial MDPs. This paper addresses both of these issues. Regarding the first one, this paper analyzes the use of three similarity metrics based on conceptual, structural and performance aspects of the financial MDPs. Regarding the second one, this paper uses Probabilistic Policy Reuse to balance the exploitation/exploration in the learning of a new financial MDP according to the similarity of the previous financial MDPs whose knowledge is reused.
翻译:马尔科夫决策程序(MDPs)是正式描述许多机器学习问题的有效方式。事实上,最近,MDPs也成为了模拟金融交易任务的有力框架。例如,金融市场发展方案可以模拟不同的市场情景。然而,为这些金融市场发展方案学习一种(近距离的)最佳政策可能是一个非常耗时的过程,特别是在对政策一无所知的情况下。另一个办法是找到一种类似的金融MDP,我们已对其政策有所了解,然后在为新的金融市场发展方案学习新政策时再利用这种政策。这种市场设想方案之间的知识转移提出了几个问题。一方面,如何衡量金融市场发展方案之间的相似性。另一方面,如何利用这种相似性衡量方法有效地在金融市场发展方案之间转让知识。本文件讨论了这两个问题。关于第一个问题,本文件分析了基于金融多边发展方案概念、结构和业绩方面的三种相似性指标的使用。关于第二个问题,本文件使用“稳妥政策”来利用新的金融政策重新利用MDP的类似性知识来平衡以往的金融再利用。