Deep Imitation Learning requires a large number of expert demonstrations, which are not always easy to obtain, especially for complex tasks. A way to overcome this shortage of labels is through data augmentation. However, this cannot be easily applied to control tasks due to the sequential nature of the problem. In this work, we introduce a novel augmentation method which preserves the success of the augmented trajectories. To achieve this, we introduce a semi-supervised correction network that aims to correct distorted expert actions. To adequately test the abilities of the correction network, we develop an adversarial data augmented imitation architecture to train an imitation agent using synthetic experts. Additionally, we introduce a metric to measure diversity in trajectory datasets. Experiments show that our data augmentation strategy can improve accuracy and convergence time of adversarial imitation while preserving the diversity between the generated and real trajectories.
翻译:深度模拟学习需要大量的专家演示,这些演示并不总是容易获得,特别是在复杂任务方面。克服这种标签短缺的方法之一是数据增强。然而,由于问题的相继性质,这不容易应用于控制任务。在这项工作中,我们引入了一种新的增强方法,以保持扩大轨道的成功。为此,我们引入了一个半监督的校正网络,以纠正扭曲的专家行动。为了充分测试校正网络的能力,我们开发了一个强化的模拟结构,以利用合成专家培训仿制剂。此外,我们引入了一种测量轨迹数据集多样性的尺度。实验表明,我们的数据增强战略可以提高对抗模拟的准确性和趋同时间,同时保持生成的和真实轨迹的多样性。