In this paper, we introduce HDhuman, a method that addresses the challenge of novel view rendering of human performers that wear clothes with complex texture patterns using a sparse set of camera views. Although some recent works have achieved remarkable rendering quality on humans with relatively uniform textures using sparse views, the rendering quality remains limited when dealing with complex texture patterns as they are unable to recover the high-frequency geometry details that observed in the input views. To this end, the proposed HDhuman uses a human reconstruction network with a pixel-aligned spatial transformer and a rendering network that uses geometry-guided pixel-wise feature integration to achieve high-quality human reconstruction and rendering. The designed pixel-aligned spatial transformer calculates the correlations between the input views, producing human reconstruction results with high-frequency details. Based on the surface reconstruction results, the geometry-guided pixel-wise visibility reasoning provides guidance for multi-view feature integration, enabling the rendering network to render high-quality images at 2k resolution on novel views. Unlike previous neural rendering works that always need to train or fine-tune an independent network for a different scene, our method is a general framework that is able to generalize to novel subjects. Experiments show that our approach outperforms all the prior generic or specific methods on both synthetic data and real-world data.
翻译:在本文中,我们引入了HDhuman, 这是一种应对人类表演者以复杂的纹理模式展示服装的新视角的挑战的方法。虽然最近的一些作品已经取得了显著的提高质量,使用稀少的视觉,但使用相对统一的纹理,在复杂的纹理模式处理复杂的纹理模式时,其质量仍然有限,因为他们无法恢复在投入观点中观察到的高频几何细节。为此,拟议的HDhuman使用一个人类重建网络,使用像素拉动空间变异器,以及一个利用几何制导像素比素特性整合实现高质量的人类重建与塑造的网络。设计的像素拉动空间变异变异器计算出投入观点之间的关联,用高频细节生成人类重建结果。根据地表重建结果,由几何制导的像素眼可见度推理为多视角集成提供了指导,使生成网络能够在2k分辨率的新观点中提供高质量的图像。与以往需要训练或精细化的像素像素特性合成特征合成特征合成特征合成特征合成特征合成工程不同,我们以往需要训练或精细化通用的常规模型模型模型模型模型模型模型前的模型展示了不同的模型。