Humans universally dislike the task of cleaning up a messy room. If machines were to help us with this task, they must understand human criteria for regular arrangements, such as several types of symmetry, co-linearity or co-circularity, spacing uniformity in linear or circular patterns, and further inter-object relationships that relate to style and functionality. Previous approaches for this task relied on human input to explicitly specify goal state, or synthesized scenes from scratch -- but such methods do not address the rearrangement of existing messy scenes without providing a goal state. In this paper, we present LEGO-Net, a data-driven transformer-based iterative method for learning regular rearrangement of objects in messy rooms. LEGO-Net is partly inspired by diffusion models -- it starts with an initial messy state and iteratively "de-noises'' the position and orientation of objects to a regular state while reducing the distance traveled. Given randomly perturbed object positions and orientations in an existing dataset of professionally-arranged scenes, our method is trained to recover a regular re-arrangement. Results demonstrate that our method is able to reliably rearrange room scenes and outperform other methods. We additionally propose a metric for evaluating regularity in room arrangements using number-theoretic machinery.
翻译:人类普遍不喜欢清理乱乱的房间。 如果机器要帮助我们完成这一任务, 他们必须理解常规安排的人类标准, 比如几种对称、 线性或圆形的对称性、 线性或共性、 线性或圆形的间距统一, 以及进一步的与风格和功能相关的对象间关系。 之前的任务方法依赖于人类输入来明确指定目标状态, 或者从零开始合成场景, 但是这种方法无法在不提供目标状态的情况下处理现有乱场景的重新排列问题。 在本文中, 我们提出以数据驱动的变异器为基础的迭接式方法, 用于学习在乱乱室中经常重新排列物体。 LEGO- Net 部分受扩散模型的启发 -- 最初是混乱状态, 并且反复地“ 降噪声” 启动物体的位置和方向到正常状态, 同时缩短距离。 由于在专业安排场景的现有数据集中, 随机受扰动的物体位置和方向, 我们的方法被训练可以恢复定期的重新排列。 结果显示我们的方法能够使用更可靠的后置的图像和结构。