Object-centric representations form the basis of human perception, and enable us to reason about the world and to systematically generalize to new settings. Currently, most works on unsupervised object discovery focus on slot-based approaches, which explicitly separate the latent representations of individual objects. While the result is easily interpretable, it usually requires the design of involved architectures. In contrast to this, we propose a comparatively simple approach - the Complex AutoEncoder (CAE) - that creates distributed object-centric representations. Following a coding scheme theorized to underlie object representations in biological neurons, its complex-valued activations represent two messages: their magnitudes express the presence of a feature, while the relative phase differences between neurons express which features should be bound together to create joint object representations. In contrast to previous approaches using complex-valued activations for object discovery, we present a fully unsupervised approach that is trained end-to-end - resulting in significant improvements in performance and efficiency. Further, we show that the CAE achieves competitive or better unsupervised object discovery performance on simple multi-object datasets compared to a state-of-the-art slot-based approach while being up to 100 times faster to train.
翻译:以物体为中心的表达方式构成人类感知的基础, 并使我们能够了解世界, 并系统地推广到新的设置。 目前, 大多数关于不受监督的物体发现方法的工作都集中在基于时间档的方法上, 这种方法明确区分了单个物体的潜在表现。 虽然结果很容易解释, 但通常需要设计相关结构。 与此相反, 我们提出了一个相对简单的方法―― 复合自动编码器( CAE), 产生分散的物体中心表示方式。 在一种编码方法理论化为生物神经元中的物体表示方式, 其复杂价值的激活代表了两种信息: 它们的规模表示一个特性的存在, 而神经元之间的相对阶段差异表示哪些特性应该捆绑在一起来创建共同的物体表示方式。 与以前使用复杂价值的激活来发现物体的方法相比, 我们提出了一个完全不受监督的方法, 其端对端进行训练, 从而显著地改进了性能和效率。 此外, 我们显示 CAE在简单多观察器数据集上取得了竞争性或更强的发现性能, 与以100 列列的定时速度相比, 。