In this paper, we consider the problem of autonomous driving using imitation learning in a semi-supervised manner. In particular, both labeled and unlabeled demonstrations are leveraged during training by estimating the quality of each unlabeled demonstration. If the provided demonstrations are corrupted and have a low signal-to-noise ratio, the performance of the imitation learning agent can be degraded significantly. To mitigate this problem, we propose a method called semi-supervised imitation learning (SSIL). SSIL first learns how to discriminate and evaluate each state-action pair's reliability in unlabeled demonstrations by assigning higher reliability values to demonstrations similar to labeled expert demonstrations. This reliability value is called leverage. After this discrimination process, both labeled and unlabeled demonstrations with estimated leverage values are utilized while training the policy in a semi-supervised manner. The experimental results demonstrate the validity of the proposed algorithm using unlabeled trajectories with mixed qualities. Moreover, the hardware experiments using an RC car are conducted to show that the proposed method can be applied to real-world applications.
翻译:在本文中,我们考虑使用模拟学习半监督方式进行自主驾驶的问题。 特别是, 在培训过程中,通过估计每个未贴标签的演示的质量来利用标签和未贴标签的演示。 如果所提供的演示被腐蚀,并且信号对噪音比率较低, 仿造学习剂的性能可以大大降低。 为了缓解这一问题, 我们提议了一种叫作半监督仿造学习的方法。 SSIL首先通过给类似于标签的专家演示的演示提供更高的可靠性值来学习如何在未贴标签的演示中区分和评估每个州行动对方的可靠性。 这个可靠性值被称为杠杆值。 在此歧视过程之后, 使用带有估计杠杆值的标签和未贴标签的演示, 同时以半监督的方式对政策进行培训。 实验结果显示, 使用混杂品质的未贴标签的轨迹进行的拟议算法的有效性。 此外, 使用一辆RC 汽车进行的硬件实验表明, 提议的方法可以应用于现实世界的应用。