This article aims to combine factor investing and reinforcement learning (RL). The agent learns through sequential random allocations which rely on firms' characteristics. Using Dirichlet distributions as the driving policy, we derive closed forms for the policy gradients and analytical properties of the performance measure. This enables the implementation of REINFORCE methods, which we perform on a large dataset of US equities. Across a large range of parametric choices, our result indicates that RL-based portfolios are very close to the equally-weighted (1/N) allocation. This implies that the agent learns to be *agnostic* with regard to factors, which can partly be explained by cross-sectional regressions showing a strong time variation in the relationship between returns and firm characteristics.
翻译:本条旨在将投资与强化学习因素(RL)结合起来。代理商根据公司的特点,通过顺序随机分配来学习。使用 Dirichlet的分布作为驱动政策,我们为政策梯度和业绩计量的分析特性制作封闭式表格。这有利于执行REINFORCE方法,我们用大量美国股票数据集来进行。在各种参数选择中,我们的结果表明,基于RL的投资组合非常接近于同等加权的(1/N)分配。这意味着代理商在各种因素方面学会了不可知性*,这可以部分地通过显示回报与公司特征之间关系的巨大时间变化的跨部门倒退来解释。