For the model-free deep reinforcement learning of quadruped fall recovery, the initialization of robot configurations is crucial to the data efficiency and robustness. This work focuses on algorithmic improvements of data efficiency and robustness simultaneously through automatic discovery of initial states, which is achieved by our proposed K-Access algorithm based on accessibility metrics. Specifically, we formulated accessibility metrics to measure the difficulty of transitions between two arbitrary states, and proposed a novel K-Access algorithm for state-space clustering that automatically discovers the centroids of the static-pose clusters based on the accessibility metrics. By using the discovered centroidal static poses as initial states, we improve the data efficiency by reducing the redundant exploration, and enhance the robustness by easy explorations from the centroids to sampled static poses. We studied extensive validation using an 8-DOF quadrupedal robot Bittle. Compared to random initialization, the learning curve of our proposed method converges much faster, requiring only around 60% of training episodes. With our method, the robot can successfully recover standing poses in 99.4% of tests within 3 seconds.
翻译:对于四倍下降恢复的无模型深度强化学习来说,机器人配置的初始化对于数据效率和稳健性至关重要。 这项工作的重点是通过自动发现初始状态,同时通过自动发现数据效率和稳健性进行算法改进数据效率和稳健性,这是我们基于无障碍度量表的拟议K-Access算法所实现的。 具体地说,我们制定了无障碍度度量指标,以衡量两个任意状态之间过渡的难度,并提出了一个新的K-Access 算法,以自动发现基于无障碍度量的静位组的核体。 通过使用所发现的近似固态结构作为初始状态,我们通过减少冗余的勘探,提高数据效率,并通过从偏固体到抽样静态的简单勘探提高数据的稳健性。 我们用8DOF四重机器人Bittal进行了广泛的验证研究。 与随机初始化相比,我们拟议方法的学习曲线会更快地集中得多, 只需要60%的训练过程。 用我们的方法, 机器人可以在3秒内成功恢复了99.4%的立方姿势。