Graph representation of objects and their relations in a scene, known as a scene graph, provides a precise and discernible interface to manipulate a scene by modifying the nodes or the edges in the graph. Although existing works have shown promising results in modifying the placement and pose of objects, scene manipulation often leads to losing some visual characteristics like the appearance or identity of objects. In this work, we propose DisPositioNet, a model that learns a disentangled representation for each object for the task of image manipulation using scene graphs in a self-supervised manner. Our framework enables the disentanglement of the variational latent embeddings as well as the feature representation in the graph. In addition to producing more realistic images due to the decomposition of features like pose and identity, our method takes advantage of the probabilistic sampling in the intermediate features to generate more diverse images in object replacement or addition tasks. The results of our experiments show that disentangling the feature representations in the latent manifold of the model outperforms the previous works qualitatively and quantitatively on two public benchmarks. Project Page: https://scenegenie.github.io/DispositioNet/
翻译:虽然现有作品在改变物体位置和形状方面显示了令人乐观的结果,但现场操作往往导致失去某些视觉特征,例如物体的外观或特性。在这项工作中,我们提议DispositioNet,这是一个模型,可以学习每个物体使用自监督的图像图解进行图像操作任务的分解表达方式。我们的框架使得变异潜嵌入以及图形中的特征表达方式能够分解。除了由于形状和特性等特征的分解而产生更现实的图像外,我们的方法还利用中间特征中的概率取样方法,在对象替换或附加任务中产生更多样化的图像。我们的实验结果显示,在模型的潜质和量方面,模型的深层元件的特征表达方式与以前在两个公共基准上的定性和定量工作不同。项目页面:https://scegenie.githubio/Dispostio。项目页面:https://scengenie.net/Dispostiotio。