This report introduces our winning solution of the real-robot phase of the Real Robot Challenge (RRC) 2022. The goal of this year's challenge is to solve dexterous manipulation tasks with offline reinforcement learning (RL) or imitation learning. To this end, participants are provided with datasets containing dozens of hours of robotic data. For each task an expert and a mixed dataset are provided. In our experiments, when learning from the expert datasets, we find standard Behavioral Cloning (BC) outperforms state-of-the-art offline RL algorithms. When learning from the mixed datasets, BC performs poorly, as expected, while surprisingly offline RL performs suboptimally, failing to match the average performance of the baseline model used for collecting the datasets. To remedy this, motivated by the strong performance of BC on the expert datasets we elect to use a semi-supervised classification technique to filter the subset of expert data out from the mixed datasets, and subsequently perform BC on this extracted subset of data. To further improve results, in all settings we use a simple data augmentation method that exploits the geometric symmetry of the RRC physical robotic environment. Our submitted BC policies each surpass the mean return of their respective raw datasets, and the policies trained on the filtered mixed datasets come close to matching the performances of those trained on the expert datasets.
翻译:本报告介绍2022年实际机器人挑战(RRC)真实机器人挑战(RRC)真实机器人阶段的得分解决方案。 今年挑战的目标是通过离线强化学习(RL)或模仿学习解决极速操纵任务。 为此,向参与者提供了包含数十小时机器人数据的数据集。 为每项任务提供了一名专家和一个混合数据集。在从专家数据集学习时,我们发现标准行为克隆(BC)优于最新状态的离线RL算法。在从混合数据集学习时,BC表现欠佳,而出乎意料的是离线的RL进行亚光化处理,而出乎意料的是脱线的RL进行亚光化处理,未能与用于收集数据集的基线模型的平均性能相匹配。为了纠正这一点,我们选用一种半超强的分类技术来从混合数据集中筛选专家数据集,随后又在这个提取的数据集中进行不完全的BC。为了进一步改进结果,我们在每一个环境中都使用了经过训练的、我们经过精练的BC数据分析的简单数据分析方法。