Naturally controllable human-scene interaction (HSI) generation has an important role in various fields, such as VR/AR content creation and human-centered AI. However, existing methods are unnatural and unintuitive in their controllability, which heavily limits their application in practice. Therefore, we focus on a challenging task of naturally and controllably generating realistic and diverse HSIs from textual descriptions. From human cognition, the ideal generative model should correctly reason about spatial relationships and interactive actions. To that end, we propose Narrator, a novel relationship reasoning-based generative approach using a conditional variation autoencoder for naturally controllable generation given a 3D scene and a textual description. Also, we model global and local spatial relationships in a 3D scene and a textual description respectively based on the scene graph, and introduce a partlevel action mechanism to represent interactions as atomic body part states. In particular, benefiting from our relationship reasoning, we further propose a simple yet effective multi-human generation strategy, which is the first exploration for controllable multi-human scene interaction generation. Our extensive experiments and perceptual studies show that Narrator can controllably generate diverse interactions and significantly outperform existing works. The code and dataset will be available for research purposes.
翻译:自然控制的人类-气候互动(HSI)的生成在各个领域,如VR/AR内容创建和以人为中心的人工智能等,具有重要作用。然而,现有方法不自然,不直观,严重限制了其实际应用。因此,我们侧重于一项具有挑战性的任务,即自然和可控制地产生现实和多样化的人类-环境互动(HSI),从人类认知的角度,理想的基因化模式应当正确解释空间关系和互动行动。为此,我们提议了“叙述器”,这是一种基于逻辑推理的新式的基因化方法,为自然控制的一代使用一种有条件的自动变异,具有3D场景和文字描述。此外,我们以3D场景为全球和地方空间关系建模,分别以文字图为基础进行文字描述,并引入一个部分行动机制,作为原子体部分的状态进行互动。特别是,从我们的关系推理中,我们进一步提出一个简单而有效的多人类生成战略,这是首次探索可控制的多人-环境互动生成。我们广泛的实验和观察式研究将展示现有的数据互动,以显著的模型和模型为目的。</s>