We present a slot-wise, object-based transition model that decomposes a scene into objects, aligns them (with respect to a slot-wise object memory) to maintain a consistent order across time, and predicts how those objects evolve over successive frames. The model is trained end-to-end without supervision using losses at the level of the object-structured representation rather than pixels. Thanks to its alignment module, the model deals properly with two issues that are not handled satisfactorily by other transition models, namely object persistence and object identity. We show that the combination of an object-level loss and correct object alignment over time enables the model to outperform a state-of-the-art baseline, and allows it to deal well with object occlusion and re-appearance in partially observable environments.
翻译:我们提出了一个从时间档到天体分解的、基于物体的过渡模型,将场景分解成物体,对物体(在时间档到天体内)进行调整,以保持时间的一致顺序,并预测这些物体如何在连续的框中演变。该模型是经过训练的端对端,没有监督,使用物体结构代表层次的损失,而不是像素。由于它的调整模块,该模型适当地处理了其他过渡模型未能令人满意地处理的两个问题,即物体持久性和物体特性。我们表明,物体水平损失和纠正物体随时间调整的结合使得该模型能够超越最先进的基线,使其能够很好地处理物体隔离和在部分可观测环境中重新出现的问题。