Researchers have demonstrated that Deep Reinforcement Learning (DRL) is a powerful tool for finding policies that perform well on complex robotic systems. However, these policies are often unpredictable and can induce highly variable behavior when evaluated with only slightly different initial conditions. Training considerations constrain DRL algorithm designs in that most algorithms must use stochastic policies during training. The resulting policy used during deployment, however, can and frequently is a deterministic one that uses the Maximum Likelihood Action (MLA) at each step. In this work, we show that a direct random search is very effective at fine-tuning DRL policies by directly optimizing them using deterministic rollouts. We illustrate this across a large collection of reinforcement learning environments, using a wide variety of policies obtained from different algorithms. Our results show that this method yields more consistent and higher performing agents on the environments we tested. Furthermore, we demonstrate how this method can be used to extend our previous work on shrinking the dimensionality of the reachable state space of closed-loop systems run under Deep Neural Network (DNN) policies.
翻译:研究人员已经表明,深强化学习(DRL)是找到在复杂的机器人系统上效果良好的政策的有力工具,然而,这些政策往往不可预测,在评估初步条件稍有不同时可能会引起高度变化的行为。培训因素限制了DRL算法的设计,因为大多数算法在培训期间必须使用随机化政策。因此,在部署期间采用的政策可以而且经常是一种确定性政策,在每一步都采用最大相似行动。在这项工作中,我们表明直接随机搜索对于通过利用确定性推出直接优化政策来微调DRL政策非常有效。我们用从不同算法中获得的大量政策来说明大量强化学习环境。我们的结果显示,这一方法在我们测试的环境中产生更加一致和更高的性能。此外,我们证明如何使用这种方法扩大我们先前在深神经网络下运行的封闭式系统可达域空间的维度。