The idea behind object-centric representation learning is that natural scenes can better be modeled as compositions of objects and their relations as opposed to distributed representations. This inductive bias can be injected into neural networks to potentially improve systematic generalization and performance of downstream tasks in scenes with multiple objects. In this paper, we train state-of-the-art unsupervised models on five common multi-object datasets and evaluate segmentation metrics and downstream object property prediction. In addition, we study generalization and robustness by investigating the settings where either a single object is out of distribution -- e.g., having an unseen color, texture, or shape -- or global properties of the scene are altered -- e.g., by occlusions, cropping, or increasing the number of objects. From our experimental study, we find object-centric representations to be useful for downstream tasks and generally robust to most distribution shifts affecting objects. However, when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts.
翻译:以物体为中心的代表性学习背后的理念是,自然场景可以更好地以物体的构成及其关系而不是分布式表达形式来模拟。这种感知性偏差可以注入神经网络,从而有可能改进多物体场景下游任务系统化的概括化和性能。在本文中,我们用五个共同的多物体数据集来培训最先进的、不受监督的模式,并评价分解指标和下游物体属性预测。此外,我们通过调查单个物体无法分布的设置来研究一般化和稳健性 -- -- 例如,具有看不见的颜色、纹理或形状 -- -- 或场景的全球特性被改变 -- -- 例如,通过隐蔽、裁剪裁或增加物体数量。从我们的实验研究来看,我们发现以物体为中心的表达方式对下游任务有用,并且一般地对影响物体的大多数分布性转移力都很强。但是,如果分布式转移以结构较弱的方式影响投入,那么分解和下游任务性表现的稳健性在模式和分布变化中可能有很大差异。