We explore how a general AI algorithm can be used for 3D scene understanding to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. Our adapted MCTS algorithm has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in an analysis-by-synthesis approach that explores the solution space by rendering the current solution and comparing it to the RGB-D observations. To perform this exploration even more efficiently, we propose simple changes to the standard MCTS' tree construction and exploration policy. We demonstrate our approach on the ScanNet dataset. Our method often retrieves configurations that are better than some manual annotations, especially on layouts.
翻译:我们探索如何用通用的 AI 算法来理解 3D 场景理解 来减少对培训数据的需求。 更确切地说, 我们提议修改蒙特卡洛树搜索算法, 以便从吵闹的 RGB- D 扫描中检索对象和房间布局。 虽然 MCTS 是作为一种游戏播放算法而开发的, 但我们也显示它也可以用于复杂的认知问题。 我们改编的 MCTS 算法没有多少容易调制的超光度参数, 并且可以优化一般损失。 我们使用它来优化基于 RGB- D 数据的天体和房间布局假设的远端概率。 这样做的结果是一种逐个分析的合成方法, 通过提供当前解决方案和将其与 RGB- D 观测比较来探索解决方案空间。 为了更高效地进行这一探索, 我们建议简单修改标准 MCTS 树构造和勘探政策。 我们在扫描网数据集上展示我们的方法。 我们的方法常常检索比一些手动说明更好的配置, 特别是在布局上。