We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) to search for convolutional cells, applied to the Procgen benchmark. We outline the initial difficulties of applying neural architecture search techniques in RL, and demonstrate that by simply replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. Surprisingly, we find that the supernet can be used as an actor for inference to generate replay data in standard RL training loops, and thus train end-to-end. Throughout this training process, we show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
翻译:我们引入了RL-DARTS, 这是在强化学习中应用差异化建筑搜索(DARTS)的第一批应用, 用于搜索卷发细胞, 应用到Proglegen 基准中。 我们概述了在RL应用神经结构搜索技术的初始困难, 并证明只要将图像编码器替换为 DARTS 超级网, 我们的搜索方法就具有样本效率, 需要最小的额外计算资源, 并且也符合离政策和政策RL算法, 只需要对原有代码稍作修改。 令人惊讶的是, 我们发现超级网络可以作为一个行为者, 在标准 RL 培训循环中进行推断, 生成重播数据, 从而培训端到端。 我们在整个培训过程中, 显示超级网络逐渐学会更好的细胞, 导致替代结构, 相对于手工设计的政策具有高度竞争力, 但也验证了先前的 RL 政策设计选择 。