In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.
翻译:在本文中,我们调查了基本问题:基于梯度的神经结构搜索技术在多大程度上适用于RL?使用最初的DARSS作为方便基线,我们发现发现,所发现的离散结构可以达到高达250%的性能,而与单向和连续行动空间环境的人工结构设计相比,在超出政策和政策性RL算法之间,仅仅在3x多的计算时间里,可以达到250%的性能。此外,通过无数的通缩研究,我们系统地核实DARSS不仅在其超级网格短语中正确进行了超重操作,而且还逐步改进了由此产生的离散细胞,比随机搜索效率高30x,这表明DARSS是改进RL结构的一个令人惊讶的有效工具。