We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.
翻译:我们研究端到端模仿学习中的视觉领域迁移问题,该研究设定于一个现实且具有挑战性的场景:目标领域数据严格遵循离策略、无专家指导且数量稀缺。我们首先提供了理论分析,表明目标领域的模仿损失可以被源领域损失加上源与目标观测模型之间的状态条件潜在KL散度的上界所约束。基于这一结果,我们提出了状态条件对抗学习,这是一种离策略对抗框架,它利用基于判别器的条件KL项估计器,在系统状态条件下对齐潜在分布。在基于BARC-CARLA模拟器构建的视觉多样性自动驾驶环境中的实验表明,SCAL实现了稳健的迁移和强大的样本效率。