深强化学习的彩票和最低任务代表 (On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning)

The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning. But how is the performance of winning lottery tickets affected by the distributional shift inherent to reinforcement learning problems? In this work, we address this question by comparing sparse agents who have to address the non-stationarity of the exploration-exploitation problem with supervised agents trained to imitate an expert. We show that feed-forward networks trained with behavioural cloning compared to reinforcement learning can be pruned to higher levels of sparsity without performance degradation. This suggests that in order to solve the RL-specific distributional shift agents require more degrees of freedom. Using a set of carefully designed baseline conditions, we find that the majority of the lottery ticket effect in both learning paradigms can be attributed to the identified mask rather than the weight initialization. The input layer mask selectively prunes entire input dimensions that turn out to be irrelevant for the task at hand. At a moderate level of sparsity the mask identified by iterative magnitude pruning yields minimal task-relevant representations, i.e., an interpretable inductive bias. Finally, we propose a simple initialization rescaling which promotes the robust identification of sparse task representations in low-dimensional control tasks.

翻译：彩票假设质疑过度衡量在受监督的深层学习中的作用。但是,中彩票的成绩如何会因强化学习问题所固有的分配性转移而受到影响? 在这项工作中,我们通过将必须解决勘探-开发问题不常出现的少数代理人与受过训练以模仿专家的受监督代理人进行比较来解决这一问题。我们表明,受过行为克隆培训的饲料前网络可以被挤到较高水平的宽度,而不降低性能。这表明,为了解决RL特有的分配性转移代理人,需要有更高程度的自由度。使用一套精心设计的基线条件,我们发现在这两种学习模式中,彩票的效果大部分可以归因于已查明的面具,而不是重量初始化。输入层有选择地掩盖与手头任务无关的全部投入层面。在适度的宽度上,通过迭代级的冲洗所查明的面具可以产生最低程度的任务相关表达方式,即可解释的偏差。最后,我们建议简单初始化重新界定彩票在两种学习模式中产生的效果,可以归因于已查明的面具,而不是重量初始化的特性。