We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room-layout models using unlabeled 360-images only. This can be valuable in practical scenarios where a pre-trained model needs to be adapted to a new data domain without using any ground truth annotations. Our simple yet effective assumption is that multiple layout estimations in the same scene must define a consistent geometry regardless of their camera positions. Based on this idea, we leverage a pre-trained model to project estimated layout boundaries from several camera views into the 3D world coordinate. Then, we re-project them back to the spherical coordinate and build a probability function, from which we sample the pseudo-labels for self-training. To handle unconfident pseudo-labels, we evaluate the variance in the re-projected boundaries as an uncertainty value to weight each pseudo-label in our loss function during training. In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations. Experimental results show that our solution achieves favorable performance against state-of-the-art methods when self-training from three publicly available source datasets to a unique, newly labeled dataset consisting of multi-view of the same scenes.
翻译:我们提出了360-MLC, 这是一种基于多视图布局一致性的自我培训方法, 用于微调单镜头布局模式, 仅使用未贴标签的360映像。 这在实际情景中可能很有价值, 因为在实际情景中, 预先训练的模型需要适应新的数据域, 而不使用任何地面真相说明。 我们简单而有效的假设是, 同一场的多版布局估计必须定义一致的几何, 不论其相机位置如何。 基于这个想法, 我们利用预先培训的模型, 从多个摄像头视图中将估计的布局界限投入 3D 世界坐标 。 然后, 我们再将它们投回到球形协调, 并构建一个概率函数, 我们从中抽取假标签进行自我训练。 要处理不自定义的假标签, 我们将重新规划的边界差异作为不确定性值, 在培训期间, 每一个假标签的位置说明都是相同的。 此外, 由于在测试期间无法提供地面真相说明, 我们将多版图信息作为定量指标, 来测量场景的几何模型的一致性,, 使我们在不使用任何可选用任何可选制的图像的模型, 。