The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning. But how does the distributional shift inherent to the reinforcement learning problem affect the performance of winning lottery tickets? In this work, we show that feed-forward networks trained via supervised policy distillation and reinforcement learning can be pruned to the same level of sparsity. Furthermore, we establish the existence of winning tickets for both on- and off-policy methods in a visual navigation and classic control task. Using a set of carefully designed baseline conditions, we find that the majority of the lottery ticket effect in reinforcement learning can be attributed to the identified mask. The resulting masked observation space eliminates redundant information and yields minimal task-relevant representations. The mask identified by iterative magnitude pruning provides an interpretable inductive bias. Its costly generation can be amortized by training dense agents with low-dimensional input and thereby at lower computational cost.
翻译:彩票假设质疑在监督的深层学习中超分法的作用。 但是,强化学习问题所固有的分配变化会如何影响中奖彩票的成绩? 在这项工作中,我们显示通过监管的政策蒸馏和强化学习所培训的进食前进网络可以达到同样的宽度。 此外,我们确定在视觉导航和经典控制任务中存在着双向政策方法的优胜票。使用一套精心设计的基线条件,我们发现在强化学习中,彩票的效果大部分可以归结于已确定的面具。由此形成的蒙面观测空间消除了多余的信息,并产生了最低限度的任务相关表现。通过迭代规模裁剪找出的遮罩提供了可解释性的偏差。其成本高昂的生成可以通过对低维输入的密集剂进行培训,从而降低计算成本来实现。