Indoor scene classification has become an important task in perception modules and has been widely used in various applications. However, problems such as intra-category variability and inter-category similarity have been holding back the models' performance, which leads to the need for new types of features to obtain a more meaningful scene representation. A semantic segmentation mask provides pixel-level information about the objects available in the scene, which makes it a promising source of information to obtain a more meaningful local representation of the scene. Therefore, in this work, a novel approach that uses a semantic segmentation mask to obtain a 2D spatial layout of the object categories across the scene, designated by segmentation-based semantic features (SSFs), is proposed. These features represent, per object category, the pixel count, as well as the 2D average position and respective standard deviation values. Moreover, a two-branch network, GS2F2App, that exploits CNN-based global features extracted from RGB images and the segmentation-based features extracted from the proposed SSFs, is also proposed. GS2F2App was evaluated in two indoor scene benchmark datasets: the SUN RGB-D and the NYU Depth V2, achieving state-of-the-art results on both datasets.
翻译:室内场景分类已成为感知模块中的一项重要任务,并被广泛用于各种应用,然而,诸如类内变异和类别间相似性等问题一直阻碍模型的性能,导致需要新型特征以获得更有意义的场景代表;语义分解面罩提供关于现场可用物体的像素级信息,使它成为获得更有意义的当地场景代表的很有希望的信息来源;因此,在这项工作中,采用一种新颖的方法,使用语义分解面罩来获得以分解为基础的全场物体类别的2D空间布局,由基于分解的语义特征(SSFs)指定,从而导致需要新型特征以获得更有意义的场景代表。这些特征代表了按对象类别分列的像素计以及2D平均位置和相应的标准偏差值。此外,还提议采用一个两层网络,即GS2F2App,利用从RGB图像中提取的CNN全球特征和从拟议的SFSFSF中提取的分解特征,以2DSF2号为对象。G2FApp在两个内部空间基准数据集上评价了SGB-UN-RGB-UN-RGB-D数据。