Neural control of memory-constrained, agile robots requires small, yet highly performant models. We leverage graph hyper networks to learn graph hyper policies trained with off-policy reinforcement learning resulting in networks that are two orders of magnitude smaller than commonly used networks yet encode policies comparable to those encoded by much larger networks trained on the same task. We show that our method can be appended to any off-policy reinforcement learning algorithm, without any change in hyperparameters, by showing results across locomotion and manipulation tasks. Further, we obtain an array of working policies, with differing numbers of parameters, allowing us to pick an optimal network for the memory constraints of a system. Training multiple policies with our method is as sample efficient as training a single policy. Finally, we provide a method to select the best architecture, given a constraint on the number of parameters. Project website: https://sites.google.com/usc.edu/graphhyperpolicy
翻译:对内存受限制、灵活机器人的神经控制需要小型但性能强的模型。 我们利用图形超强网络学习通过非政策强化学习而经过培训的图形超高政策,导致网络规模小于常用网络的两个数量级小于通常使用的网络,而将政策编码成与接受过相同任务培训的较大网络编码起来的政策。 我们显示,我们的方法可以附在任何非政策强化学习算法中,而不改变超参数,通过显示移动和操作任务的结果。 此外,我们获得了一系列工作政策,其参数不同,允许我们选择一个最佳网络来应对系统的内存限制。 以我们的方法培训多重政策的效率与培训单项政策一样。 最后,我们提供了一种选择最佳结构的方法,因为参数的数量有限。 项目网站: https://sites.gogle.com/usc.edu/graphyperpolicy。