Recently, Differentiable Architecture Search (DARTS) has become one of the most popular Neural Architecture Search (NAS) methods successfully applied in supervised learning (SL). However, its applications in other domains, in particular for reinforcement learning (RL), has seldom been studied. This is due in part to RL possessing a significantly different optimization paradigm than SL, especially with regards to the notion of replay data, which is continually generated via inference in RL. In this paper, we introduce RL-DARTS, one of the first applications of end-to-end DARTS in RL to search for convolutional cells, applied to the challenging, infinitely procedurally generated Procgen benchmark. We demonstrate that the benefits of DARTS become amplified when applied to RL, namely search efficiency in terms of time and compute, as well as simplicity in integration with complex preexisting RL code via simply replacing the image encoder with a DARTS supernet, compatible with both off-policy and on-policy RL algorithms. At the same time however, we provide one of the first extensive studies of DARTS outside of the standard fixed dataset setting in SL via RL-DARTS. We show that throughout training, the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
翻译:最近,差异式建筑搜索(DARTS)已成为在监督学习中成功应用的最受欢迎的神经结构搜索(NAS)方法之一。然而,它在其他领域的应用,特别是用于强化学习(RL),却很少加以研究。部分原因在于RL拥有与SL截然不同的优化模式,特别是在重放数据的概念方面,重放数据的概念是不断通过在RL的推理生成的。在本文中,我们引入了RL-DARSS,这是在RL中应用端到端端DARSS的首次应用,以寻找具有挑战性的、在程序上无限生成的Procgen基准(RL) 。我们证明,当应用到RL时,DARSS的效益会扩大,即在时间和兼容方面搜索效率方面,以及在与复杂的原存在的RL代码整合方面,仅仅用DARSS的超级网络取代图像编码,既符合离政策和在政策上的RL算法,也是在政策上,我们首次对DARARTS外部设计的广泛研究,然后通过S-L的高级标准设计结构,我们还可以通过S-rnet逐步学习高层次的S的标准结构。