We explore how a general AI algorithm can be used for 3D scene understanding in order to reduce the need for training data. More exactly, we propose a modification of the Monte Carlo Tree Search (MCTS) algorithm to retrieve objects and room layouts from noisy RGB-D scans. While MCTS was developed as a game-playing algorithm, we show it can also be used for complex perception problems. It has few easy-to-tune hyperparameters and can optimise general losses. We use it to optimise the posterior probability of objects and room layout hypotheses given the RGB-D data. This results in an analysis-by-synthesis approach that explores the solution space by rendering the current solution and comparing it to the RGB-D observations. To perform this exploration even more efficiently, we propose simple changes to the standard MCTS' tree construction and exploration policy. We demonstrate our approach on the ScanNet dataset. Our method often retrieves configurations that are better than some manual annotations especially on layouts.
翻译:我们探索如何使用通用的 AI 算法来理解 3D 场景, 以减少对培训数据的需求。 更确切地说, 我们提议修改蒙特卡洛树搜索算法, 以从吵闹的 RGB- D 扫描中检索对象和房间布局。 虽然 MCTS 是作为一种游戏播放算法开发的, 但我们也显示它也可以用于复杂的认知问题。 它没有那么容易调试的超光量参数, 并且可以优化一般损失。 我们用它来优化 RGB- D 数据中天体和房间布局假设的事后概率 。 这导致一种逐个分析的合成方法, 通过提供当前解决方案并将其与 RGB- D 观察比较来探索解决方案的空间。 为了更高效地进行这种探索, 我们建议对标准 MCTS 树构造和勘探政策进行简单的修改 。 我们在扫描网络数据集上展示了我们的方法。 我们的方法常常检索比布局上的一些手动说明更好的配置 。