奉行稳定和健壮的目标中心学习: 将插槽与对象相结合。摘要：目标中心学习 (Object-centric learning, OCL) 致力于通过将场景表示为一组以对象为中心的表示来实现场景的普遍和组合理解。OCL 已经扩展到多视角图像和视频数据集，通过利用多图片数据中的几何或时间信息应用各种数据驱动的归纳偏差。，实现场景的普遍和组合理解。单视图图像提供的关于如何解开给定场景的信息较少，比视频或多视图图像提供的信息较少。因此，由于应用归纳偏差的困难性，对于单视图图像的 OCL 仍然具有挑战性，导致对象中心表征的不一致学习。为此，我们引入了一个新的 OCL 框架，用于单视图图像，即 Slot Attention via Shepherding (SLASH)，它由两个简单但有效的模块组成，放在 Slot Attention 之上。新模块，Attention Refining Kernel (ARK)和 Intermediate Point Predictor and Encoder (IPPE)，分别防止插槽被背景噪音所干扰，并指示插槽应注意的位置，以便有助于学习面向对象的表示。我们还提出了一种弱半监督方法用于 OCL，而我们提出的框架可以在推理期间无需任何辅助注释即可使用。实验表明，我们提出的方法实现了一致的对象中心表示学习，在四个数据集上实现了强大的绩效。代码可在 https://github.com/object-understanding/SLASH 中获得。 (Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning)

翻译：奉行稳定和健壮的目标中心学习: 将插槽与对象相结合。摘要：目标中心学习 (Object-centric learning, OCL) 致力于通过将场景表示为一组以对象为中心的表示来实现场景的普遍和组合理解。OCL 已经扩展到多视角图像和视频数据集，通过利用多图片数据中的几何或时间信息应用各种数据驱动的归纳偏差。，实现场景的普遍和组合理解。单视图图像提供的关于如何解开给定场景的信息较少，比视频或多视图图像提供的信息较少。因此，由于应用归纳偏差的困难性，对于单视图图像的 OCL 仍然具有挑战性，导致对象中心表征的不一致学习。为此，我们引入了一个新的 OCL 框架，用于单视图图像，即 Slot Attention via Shepherding (SLASH)，它由两个简单但有效的模块组成，放在 Slot Attention 之上。新模块，Attention Refining Kernel (ARK)和 Intermediate Point Predictor and Encoder (IPPE)，分别防止插槽被背景噪音所干扰，并指示插槽应注意的位置，以便有助于学习面向对象的表示。我们还提出了一种弱半监督方法用于 OCL，而我们提出的框架可以在推理期间无需任何辅助注释即可使用。实验表明，我们提出的方法实现了一致的对象中心表示学习，在四个数据集上实现了强大的绩效。代码可在 https://github.com/object-understanding/SLASH 中获得。

Jinwoo Kim,Janghyuk Choi,Ho-Jin Choi,Seon Joo Kim

Object-centric learning (OCL) aspires general and compositional understanding of scenes by representing a scene as a collection of object-centric representations. OCL has also been extended to multi-view image and video datasets to apply various data-driven inductive biases by utilizing geometric or temporal information in the multi-image data. Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do. Hence, owing to the difficulty of applying inductive biases, OCL for single-view images remains challenging, resulting in inconsistent learning of object-centric representation. To this end, we introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention. The new modules, Attention Refining Kernel (ARK) and Intermediate Point Predictor and Encoder (IPPE), respectively, prevent slots from being distracted by the background noise and indicate locations for slots to focus on to facilitate learning of object-centric representation. We also propose a weak semi-supervision approach for OCL, whilst our proposed framework can be used without any assistant annotation during the inference. Experiments show that our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets. Code is available at \url{https://github.com/object-understanding/SLASH}.

翻译：将插槽与对象相结合: 奉行稳定和健壮的目标中心学习