We study the learning dynamics of self-predictive learning for reinforcement learning, a family of algorithms that learn representations by minimizing the prediction error of their own future latent representations. Despite its recent empirical success, such algorithms have an apparent defect: trivial representations (such as constants) minimize the prediction error, yet it is obviously undesirable to converge to such solutions. Our central insight is that careful designs of the optimization dynamics are critical to learning meaningful representations. We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse. Then in an idealized setup, we show self-predictive learning dynamics carries out spectral decomposition on the state transition matrix, effectively capturing information of the transition dynamics. Building on the theoretical insights, we propose bidirectional self-predictive learning, a novel self-predictive algorithm that learns two representations simultaneously. We examine the robustness of our theoretical insights with a number of small-scale experiments and showcase the promise of the novel representation learning algorithm with large-scale experiments.
翻译:我们研究的是自我预测学习的学习动态,以强化学习为目的,这是一套算法,通过尽量减少其未来潜在表现的预测错误来了解表现方式。尽管这种算法最近取得了经验上的成功,但它有一个明显的缺陷:微不足道的表示(如常数)将预测错误最小化,但显然不宜汇集到这样的解决办法中。我们的中心见解是,仔细设计优化动态对于学习有意义的表示方式至关重要。我们发现,更快地优化预测器和半渐进式更新代表方式对于防止代表方式崩溃至关重要。然后,在一个理想化的设置中,我们显示自我预测性学习动态对州过渡矩阵进行光谱分解,有效地捕取过渡动态的信息。我们根据理论的洞察,提出双向自我预测性学习,一种新颖的自我预测性算法,同时学习两个表述方式。我们用一些小规模的实验来研究我们的理论洞察力,并展示以大规模实验来展示新表述的算法的希望。