This report surveys advances in deep learning-based modeling techniques that address four different 3D indoor scene analysis tasks, as well as synthesis of 3D indoor scenes. We describe different kinds of representations for indoor scenes, various indoor scene datasets available for research in the aforementioned areas, and discuss notable works employing machine learning models for such scene modeling tasks based on these representations. Specifically, we focus on the analysis and synthesis of 3D indoor scenes. With respect to analysis, we focus on four basic scene understanding tasks -- 3D object detection, 3D scene segmentation, 3D scene reconstruction and 3D scene similarity. And for synthesis, we mainly discuss neural scene synthesis works, though also highlighting model-driven methods that allow for human-centric, progressive scene synthesis. We identify the challenges involved in modeling scenes for these tasks and the kind of machinery that needs to be developed to adapt to the data representation, and the task setting in general. For each of these tasks, we provide a comprehensive summary of the state-of-the-art works across different axes such as the choice of data representation, backbone, evaluation metric, input, output, etc., providing an organized review of the literature. Towards the end, we discuss some interesting research directions that have the potential to make a direct impact on the way users interact and engage with these virtual scene models, making them an integral part of the metaverse.
翻译:本文概述了基于深度学习的场景建模技术的进展,涵盖了室内场景分析和合成。我们讨论了用于室内场景建模的不同表示方法、可用于研究这些领域的各种室内场景数据集,并重点介绍了使用机器学习模型进行室内场景建模任务的显著工作。具体而言,我们聚焦于四个基本的场景理解任务:三维物体检测、三维场景分割、三维场景重建和三维场景相似度计算。在合成方面,我们主要讨论了神经场景合成工作,同时还突出了人类中心化的、渐进式场景合成的模型驱动方法。我们确定了为这些任务建模所涉及的挑战以及需要开发的机器,以适应数据表征和任务设置的范畴。针对每个任务,我们提供了关于使用不同数据表示、主干网络、评估指标、输入输出等方面的综合最新成果概述,为文献提供有组织的审查。最后,我们讨论了一些有趣的研究方向,这些方向有可能直接影响用户与这些虚拟场景模型的交互和参与,使它们成为元宇宙的重要组成部分。