Learning-based methods for training embodied agents typically require a large number of high-quality scenes that contain realistic layouts and support meaningful interactions. However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts. This paper presents Luminous, the first research framework that employs state-of-the-art indoor scene synthesis algorithms to generate large-scale simulated scenes for Embodied AI challenges. Further, we automatically and quantitatively evaluate the quality of generated indoor scenes via their ability to support complex household tasks. Luminous incorporates a novel scene generation algorithm (Constrained Stochastic Scene Generation (CSSG)), which achieves competitive performance with human-designed scenes. Within Luminous, the EAI task executor, task instruction generation module, and video rendering toolkit can collectively generate a massive multimodal dataset of new scenes for the training and evaluation of Embodied AI agents. Extensive experimental results demonstrate the effectiveness of the data generated by Luminous, enabling the comprehensive assessment of embodied agents on generalization and robustness.
翻译:以学习为基础的培训中介机构的培训方法通常要求大量的高质量场景,这些场景包含现实的布局,支持有意义的互动;然而,目前对模拟AI(EAI)挑战的模拟器只提供数量有限的模拟室内场景。本文介绍了第一个使用最先进的室内场景合成算法来产生大规模模拟场景的模拟场景的室内场景模拟器的研究框架Luminous。此外,我们通过其支持复杂家务任务的能力,自动和定量地评估室内场景的质量。显露式采用了一种新颖的场景生成算法(Constraced Stopchaic Sceneene Generation(CSSG)),该算作人设计的场景的竞争性性能。在Luminoous内部,EAI任务执行器、任务生成模块和视频提供工具包可以集体产生大量多式数据组,用于培训和评估Embuded AI 代理机构的新场景。广泛的实验结果表明Lumminous所生成的数据的有效性,从而能够全面评估体现的代理人的一般性和坚固性。