Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
翻译:以示范方式进行学习的模拟学习,对于没有预先界定奖赏功能的顺序决策任务,已经进行了研究和推进。但是,模仿学习方法仍然需要许多专家示范样本,才能成功地模仿专家的行为。为了提高抽样效率,我们利用自我监督的代表学习,这可以从给定数据中产生巨大的培训信号。在本研究中,我们提议一种以自我监督为基础的以代表为基础的对抗模拟学习方法,以学习适合不同扭曲和时间预测的关于非形象控制任务的国家和行动表现。特别是,与现有的以表列数据为主的自我监督学习方法相比,我们提出了一种不同的腐败方法,用于对不同扭曲情况具有强大的国家和行动表现。我们从理论上和从经验上认为,使信息性特征的多重性与抽样复杂性不那么复杂,可以大大改进模拟学习的绩效。拟议方法表明,比MuJoco公司现有的对抗性模拟学习方法有39%的相对改进,在限定为100个专家州-行动配。此外,我们利用不同最佳性的演示来提供一系列的洞察力。</s>