Recent studies on unsupervised object detection based on spatial attention have achieved promising results. Models, such as AIR and SPAIR, output "what" and "where" latent variables that represent the attributes and locations of objects in a scene, respectively. Most of the previous studies concentrate on the "where" localization performance; however, we claim that acquiring "what" object attributes is also essential for representation learning. This paper presents a framework, GMAIR, for unsupervised object detection. It incorporates spatial attention and a Gaussian mixture in a unified deep generative model. GMAIR can locate objects in a scene and simultaneously cluster them without supervision. Furthermore, we analyze the "what" latent variables and clustering process. Finally, we evaluate our model on MultiMNIST and Fruit2D datasets and show that GMAIR achieves competitive results on localization and clustering compared to state-of-the-art methods.
翻译:最近根据空间关注进行的关于不受监督物体探测的研究取得了可喜的成果。模型,如AIR和SPAIR, 分别代表物体在现场的属性和位置的输出“什么”和“哪里”潜在变量。以往的大多数研究集中于“哪里”的定位性能;然而,我们声称获得“什么”物体属性对于代表式学习也至关重要。本文提出了一个框架,即GMAIR, 用于不受监督的物体探测。它将空间关注和高斯混合物纳入一个统一的深层基因模型中。GMAIR可以将物体定位在现场,并同时将其分组,而无需监督。此外,我们分析了“什么”潜在变量和组集过程。最后,我们评估了我们关于多MINIS和Flest2D数据集的模型,并表明GMAIR在与最新方法相比,在本地化和集群上取得了竞争性的结果。