In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of the generated image. Extensive experiments on two large-scale datasets, i.e., HO3Dv3 and DexYCB, demonstrate the effectiveness and superiority of our framework both quantitatively and qualitatively. The project page is available at https://play-with-hoi-generation.github.io/.
翻译:在这项工作中,我们致力于一项新任务,即手对面互动图像生成,目的是有条件地在给定的手、对象及其互动状态下生成手对面图像。这项任务在许多潜在应用情景中具有挑战性和研究价值,如AR/VR游戏和在线购物等。为了解决这一问题,我们提议了一个新型的HOGAN框架,它利用显像模型对型手对面显示并利用其内在的地形学来构建统一的地表空间。在这个空间中,我们明确考虑互动期间复杂的自我和相互隔离。在最后的图像合成中,我们考虑手和对象的不同特性,并以分裂和共振的方式生成目标图像。为了评估,我们建立了一个综合协议,以获取生成图像的忠实性和结构保护。在两个大型数据集(即HO3Dv3和DexYCB)上的广泛实验,展示了我们框架的定量和定性的有效性和优越性。项目页面可在 https://play-with-hoisage.i.page.