This work focuses on learning useful and robust deep world models using multiple, possibly unreliable, sensors. We find that current methods do not sufficiently encourage a shared representation between modalities; this can cause poor performance on downstream tasks and over-reliance on specific sensors. As a solution, we contribute a new multi-modal deep latent state-space model, trained using a mutual information lower-bound. The key innovation is a specially-designed density ratio estimator that encourages consistency between the latent codes of each modality. We tasked our method to learn policies (in a self-supervised manner) on multi-modal Natural MuJoCo benchmarks and a challenging Table Wiping task. Experiments show our method significantly outperforms state-of-the-art deep reinforcement learning methods, particularly in the presence of missing observations.
翻译:这项工作的重点是利用多种、可能不可靠的传感器,学习有用和强大的深海世界模型。我们发现,目前的方法不足以鼓励不同模式之间的共同代表性;这可能导致下游任务业绩不佳和过度依赖特定传感器。作为一种解决办法,我们贡献了一种新的多模式的潜伏状态空间模型,通过对相互信息进行较低约束的培训。关键创新是一个特别设计的密度比率估计仪,鼓励每种模式潜在代码之间的一致性。我们委托我们的方法(以自我监督的方式)学习多模式的自然 MujoCo基准和具有挑战性的表反射任务的政策。实验显示,我们的方法大大超过了最先进的深度强化学习方法,特别是在缺少观测的情况下。