Object compositing based on 2D images is a challenging problem since it typically involves multiple processing stages such as color harmonization, geometry correction and shadow generation to generate realistic results. Furthermore, annotating training data pairs for compositing requires substantial manual effort from professionals, and is hardly scalable. Thus, with the recent advances in generative models, in this work, we propose a self-supervised framework for object compositing by leveraging the power of conditional diffusion models. Our framework can hollistically address the object compositing task in a unified model, transforming the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling. To preserve the input object's characteristics, we introduce a content adaptor that helps to maintain categorical semantics and object appearance. A data augmentation method is further adopted to improve the fidelity of the generator. Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
翻译:基于 2D 图像的对象合成是一个具有挑战性的问题, 因为它通常涉及多个处理阶段, 如色彩协调、 几何校正和阴影生成, 从而产生现实的结果。 此外, 用于合成的培训数据配对说明需要专业人士大量人工操作, 并且几乎无法缩放。 因此, 随着基因模型的最新进步, 我们在此工作中提出了一个自我监督的物体合成框架, 利用有条件的传播模型的力量。 我们的框架可以拼凑地在一个统一的模型中处理对象组合任务, 转换生成对象的视图、 几何、 颜色和阴影, 而不需要手动标签 。 为了保存输入对象的特性, 我们引入一个内容调整器, 帮助保持绝对的精度和对象外观。 数据增强方法被进一步采用来提高生成器的准确性。 我们的方法在对各种真实世界图像的用户研究中, 超越了合成结果图像的真实性和忠实性的相关基线 。